page
BUILDING FAST DATA APPLICATIONS
WITH STREAMING DATA
Ryan Betts, CTO
VoltDB
1
page
© 2014 VoltDB PROPRIETARY
 page
AGENDA
•  Fast Data Application Patterns
•  Digging Deeper: Looking at the Data
•  Streaming Approach
•  DB Approach
•  Summary
2
page
© 2014 VoltDB PROPRIETARY
Collect	
   Explore	
  
Analyze	
  
Act	
  
Data leads to
applications

Applications create
more data
3
page
© 2014 VoltDB PROPRIETARY
DATA ARCHITECTURE FOR FAST + BIG DATA

Enterprise Apps
ETL
CR
M
ER
P
Etc
.
Data Lake
(HDFS, etc.)
BIG DATA
SQL on
Hadoop
Map
Reduce
Exploratory
Analytics
BI
Reporting
Fast Operational
Database
FAST DATA
Export
Ingest /
Interactive
Streaming
Analytics
Fast Serve
Analytics
Decisioning
4
page
© 2014 VoltDB PROPRIETARY
IN THE BIG CORNER
Systems facilitating exploration and analytics of large
data sets
5
Example	
  Technologies	
  
Columnar	
  OLAP	
  warehouses	
  
Hadoop	
  Ecosystem	
  
•  MapReduce	
  
•  Hive,	
  Pig	
  
•  SQL.next:	
  Impala,	
  Drill,	
  Shark	
  
Example	
  Applica7ons	
  
•  User	
  segmentaHon	
  &	
  pre-­‐scoring	
  
•  Seasonal	
  trending	
  
•  RecommendaHon	
  matrix	
  calculaHons	
  
•  Building	
  search	
  indexes	
  
•  Data	
  Science:	
  staHsHcal	
  clustering,	
  
Machine	
  learning	
  
page
© 2014 VoltDB PROPRIETARY
IN THE FAST CORNER
Systems facilitating real time ingest, analytics and
decisions against incoming event feeds
6
Example	
  Technologies	
  
•  Streaming	
  frameworks	
  
•  VoltDB	
  
	
  
Example	
  Applica7ons	
  
•  Micro-­‐personalizaHon	
  
•  RecommendaHon	
  serving	
  
•  AlerHng/alarming	
  
•  OperaHonal	
  monitoring	
  
•  Data	
  enrichment	
  (ETL	
  eliminaHon)	
  
•  High	
  throughput	
  authorizaHon	
  
•  Ex:	
  API	
  quota	
  enforcement	
  
page
© 2014 VoltDB PROPRIETARY
REAL TIME SCORING EXAMPLE
OLAP / Hadoop
User	
  segmenta6on	
  model	
  
Calculated	
  on	
  Big	
  Side	
  and	
  
cached	
  in	
  Fast	
  Side	
  
Personaliza6on	
  requests	
  
Score	
  based	
  responses	
  
Game	
  play	
  events	
  and	
  scoring	
  decisions	
  
exported	
  to	
  Big	
  
7
page
FAST AND BIG IN COMBINATION
•  Fast Profile
•  In memory: user segmentation
- GB to TB (300M+ rows)
•  10k to 1M+ requests/sec 
•  99 percentile latency under
5ms. (5x9’s under 50ms)
•  VoltDB export to Vertica

•  Big Profile
•  TB to PB of historical data
•  Columnar analytics for fast
reporting.
•  Real time ingest of historical
data (possibly via VoltDB)
•  Vertica UDX to VoltDB
page
© 2014 VoltDB PROPRIETARY
TYPICAL FAST QUESTIONS
9
Hadoop	
  
SQL	
  OLAP	
  
Fast	
  
•  Is	
  the	
  fast	
  layer	
  streaming?	
  
•  It	
  is	
  oSen	
  more	
  like	
  OLTP	
  
•  How	
  do	
  the	
  pieces	
  communicate?	
  
•  OLAP	
  analyHcs	
  from	
  Big	
  -­‐>	
  Fast	
  
•  New	
  events	
  from	
  Fast	
  -­‐>	
  Big	
  
•  Where	
  do	
  “analy7cs”	
  belong?	
  
•  AnalyHcs	
  with	
  decisions:	
  with	
  Fast	
  
•  AnalyHcs	
  against	
  history:	
  with	
  Big	
  
•  Are	
  streaming	
  frameworks	
  equivalent?	
  
•  TradiHonal	
  SQL	
  CEP	
  (Esper)	
  
•  Tuple	
  DAGs	
  (Storm)	
  
•  Window	
  processors	
  on	
  Hadoop	
  (Spark)	
  
	
  
page
© 2014 VoltDB PROPRIETARY
THREE FAST DATA APPLICATION PATTERNS
•  Real-Time Analytics
•  Real-time analytics for operations
•  Real-time KPI measurement
•  Real-time analytics for apps

•  Data Pipelines
•  Streaming data enrichment
•  Sessionization / re-assembly
•  Correlation (by time, by location, by id)
•  Filtering
•  Pre-aggregation
10
•  Fast Request/Response
•  Mobile Authorization
•  Campaign Authorization
•  Fast API Quota Enforcement
•  Micro-Personalization
•  Recommendation Serving
page
© 2014 VoltDB PROPRIETARY
DATA FLOWS
11
Pipeline
Data Lake
HDFS/OLAP/Queue
Ingest
Real Time Analytics
Export
Using Real Time Data
Request/
Response
page
© 2014 VoltDB PROPRIETARY
THE INPUT FEED IS ONLY A SMALL PART OF FAST DATA
Data
 Temporality
Input Feed
 Click stream, tick stream, sensors,
metrics
Real-Time
Analytic Results
Event metadata
 Device version, location, user
profiles, point of interest data
OLAP Analytics Used in
Real-Time Decisions
Responses
12
Examples
Event Stream
Persistent/
Queryable
Persistent
(Look-Ups)
Pipeline Output
Persistent
(Look-Ups)
Event Stream
Event Stream
Counters, streaming aggregates,
Time-series rollups
Scoring models, seasonal usage,
demographic trends
Policy enforcement decisions,
Personalization recommendations
Enriched, filtered, correlated
transform of input feed
page
© 2014 VoltDB PROPRIETARY
THREE REQUIREMENTS CREATE STATE

1.  RT analytics outputs must be queryable
2.  Metadata, dimension data, “lookup tables” to
create groupings for analytics and to supply
enrichment data
3.  Grouping, filtering and aggregating generate
intermediate state – open sessions, partially
assembled logical events
13
page
© 2014 VoltDB PROPRIETARY
STORM: A COMMON ALTERNATIVE
•  Spouts and Bolts
•  Streaming computation
•  Run snippets of java against each event
•  Connect queues to backends with
intermediate code
14
But…	
  
1.  Need	
  “lookup”	
  database	
  for	
  dimension	
  data.	
  
2.  Need	
  a	
  “serving”	
  database	
  for	
  analyHc	
  results	
  
3.  Need	
  addiHonal	
  management	
  clusters	
  (ZooKeeper)	
  
4.  No	
  ad-­‐hoc	
  queries.	
  
5.  Lots	
  of	
  custom	
  code	
  (rarely	
  declaraHve).	
  
page
© 2014 VoltDB PROPRIETARY
STREAMING OPERATORS NEED STATE
Require State
•  Filter
•  Join
•  Aggregate
•  Group By
Stateless
•  Partition
15
page
© 2014 VoltDB PROPRIETARY
VOLTDB: REAL-TIME ANALYTICS
16
VoltDB
Metadata	
  
(Dimension	
  table)	
  
Session	
  state	
  
(Fact	
  table)	
   •  Operational analytics and
monitoring
•  RT analytics enabling user-
facing applications
•  KPI for internal BI/Dashboards
•  In-memory MPP SQL over
ODBC/JDBC
•  Cheap + correct materialized
views for streaming aggregations
SQL,	
  Views	
  
Ingest
page
© 2014 VoltDB PROPRIETARY
VOLTDB: REQUEST/RESPONSE DECISIONS
17
•  Authorization
•  RT balance checks, quota
enforcement 
•  Personalization and
Recommendation Serving
•  Combine pre-score with immediate
context
•  Fully ACID transaction model.
•  Thousands to Millions per second
•  At less than 5ms latencies
Metadata	
  
(Dimension	
  table)	
  
Session	
  state	
  
(Fact	
  table)	
  
ACID	
  TransacHons	
  
page
© 2014 VoltDB PROPRIETARY
VOLTDB: DATA PIPELINES WITH EXPORT
18
VoltDB
Metadata	
  
(Dimension	
  table)	
  
Session	
  state	
  
(Fact	
  table)	
  
•  Filtering (ex: only RFID /
iBeacon readings that show
change from previous location).
•  Sessionization
•  Common version re-writing
•  Data enrichment

•  MPP streaming Export
•  Row data, Thrift messages, CSV
•  OLAP, HDFS and message queues
Export
page
© 2014 VoltDB PROPRIETARY
PIPELINE DEPLOYED: VOLTDB… 

19
Manages	
  game	
  state	
  for	
  online	
  poker	
  and	
  archives	
  completed	
  games	
  to	
  Hadoop.	
  
	
  
Ingests	
  smart	
  meter	
  readings	
  from	
  concentrators,	
  supports	
  real	
  7me	
  applica7ons	
  and	
  
buffers	
  data	
  for	
  end	
  of	
  day	
  billing	
  media7on	
  systems.	
  
	
  
page
© 2014 VoltDB PROPRIETARY
PIPELINE DEPLOYED: VOLTDB… 

20
Ingests	
  RFID	
  readings,	
  supports	
  real	
  Hme	
  applicaHons	
  that	
  push	
  social	
  media	
  updates	
  based	
  
on	
  VoltDB	
  leaderboards.	
  
	
  
Processes	
  clickstream	
  logs	
  and	
  exports	
  correlated	
  USERID	
  records	
  for	
  use	
  at	
  CDN	
  endpoints	
  
for	
  adverHsing	
  targeHng	
  
	
  
Processes	
  SKU	
  catalogs	
  from	
  suppliers	
  to	
  produce	
  correlated	
  catalog	
  that	
  is	
  exported	
  to	
  
indexing	
  and	
  post-­‐processing	
  for	
  an	
  online	
  retailer.	
  
page
© 2014 VoltDB PROPRIETARY
VOLTDB EXPORT ANSWERS THE QUESTIONS:

How do I stream filtered,
enriched, updated results
to OLAP/HDFS systems?
21
How do I send alerts,
alarms, SMS, or messages
to downstream
applications?
page
© 2014 VoltDB PROPRIETARY
VOLTDB EXPORT UI
CREATE TABLE events (!
EventID INTEGER,!
time TIMESTAMP,!
msg VARCHAR(128));!
EXPORT TABLE events;!
22
<export enabled="true"
target="file">!
ddl.sql!
deployment.xml!INSERT into TABLE values…!
Application SQL!
page
© 2014 VoltDB PROPRIETARY
EXPORTING TO HDFS
<export enabled="true" target="http">!
<configuration>!
<property name="endpoint">!
http://hadoopserver/webhdfs/v1.0/%t/%p.%t.%g.csv!
</property>!
</configuration>!
</export>
page
© 2014 VoltDB PROPRIETARY
EXPORT FORMATS
•  CSV
•  TSV
•  Avro container
•  Raw data
page
© 2014 VoltDB PROPRIETARY
EXTENSIBLE API
25
public void onBlockStart() throws RestartBlockException;{!
}!
!
public boolean processRow(int rowSize, byte[] rowData) throws
"RestartBlockException {!
}!
!
public void onBlockCompletion() throws RestartBlockException {!
}!
All	
  of	
  these	
  export	
  connectors	
  are	
  hosted	
  plugins	
  to	
  the	
  VoltDB	
  
database.	
  VoltDB	
  manages	
  HA,	
  fault	
  tolerance,	
  configura6on,	
  
and	
  MPP	
  scale-­‐out.	
  
page
© 2014 VoltDB PROPRIETARY
DATA ARCHITECTURE FOR FAST + BIG DATA

Enterprise Apps
ETL
CR
M
ER
P
Etc
.
Data Lake
(HDFS, etc.)
BIG DATA
SQL on
Hadoop
Map
Reduce
Exploratory
Analytics
BI
Reporting
Fast Operational
Database
FAST DATA
Export
Ingest /
Interactive
Streaming
Analytics
Fast Serve
Analytics
Decisioning
26
page
© 2014 VoltDB PROPRIETARY
 27
page
© 2014 VoltDB PROPRIETARY
 page
THANK YOU!
28
page
© 2014 VoltDB PROPRIETARY
VOLTDB:
29
•  We	
  say	
  “Ingest	
  &Export”	
  vs.	
  Spout	
  and	
  Bolt	
  
•  Scale	
  (ACID)	
  snippets	
  of	
  Java	
  for	
  each	
  
incoming	
  event.	
  
	
  
AND…	
  
•  Actually	
  serve	
  real	
  7me	
  analy7cs	
  via	
  SQL	
  
•  Metadata	
  for	
  lookup/enrichment	
  implicit	
  in	
  DB	
  func7on	
  
•  Integrate	
  with	
  OLAP	
  systems	
  to	
  use	
  OLAP	
  reports	
  with	
  event-­‐
based	
  processing	
  
•  Generate	
  fast	
  transac7onal	
  responses	
  
•  Support	
  ad-­‐hoc	
  queryability	
  
•  Declara7ve	
  aggrega7ons	
  vs.	
  code	
  
•  Fast:	
  no	
  need	
  to	
  micro-­‐batch	
  
page
© 2014 VoltDB PROPRIETARY
VOLTDB: DATA PIPELINES WITH EXPORT
30
VoltDB
Metadata	
  
(Dimension	
  table)	
  
Session	
  state	
  
(Fact	
  table)	
  
•  Filtering (ex: only RFID /
iBeacon readings that show
change from previous location).
•  Sessionization
•  Common version re-writing
•  Data enrichment

•  MPP streaming Export
•  Row data, Thrift messages, CSV
•  OLAP, HDFS and message queues
Export
page
© 2014 VoltDB PROPRIETARY
INTEGRATING DATA SOURCES WITH VOLTDB
•  CSV	
  loader	
  
•  Kaga	
  loader	
  
•  JDBC	
  loader	
  
•  Vertica UDx	
  
•  Extensible	
  loader	
  API	
  
•  JDBC	
  
•  ODBC	
  
•  HTTP	
  JSON	
  
•  NaHve	
  client	
  drivers	
  /	
  SDKs	
  
BULK	
  LOADERS	
   APPLICATION	
  INTERFACES	
  
page
© 2014 VoltDB PROPRIETARY
INTEGRATING WITH CSV DATA
csvloader volttable -f data.csv!
page
© 2014 VoltDB PROPRIETARY
INTEGRATING WITH KAFKA
kafkaloader volttable !
--zookeeper=zkserver:2181 !
--topic=topicname!
page
© 2014 VoltDB PROPRIETARY
INTEGRATING WITH JDBC SOURCES
jdbcloader volttable !
--jdbcurl=jdbc:postgresql://server/db !
--jdbcdriver=org.postgresql.Driver !
--jdbctable=table
page
© 2014 VoltDB PROPRIETARY
INTEGRATING WITH HP VERTICA UDX
SELECT voltdbload(c1, c2, c3!
USING PARAMETERS voltservers='localhost’,!
volttable=‘volttable’)!
FROM T;!
page
© 2014 VoltDB PROPRIETARY
EXTENSIBLE VOLTBULKLOADER: BULK SMASH
36
VoltBulkLoader loader =!
"client.getNewBulkLoader(tableName, maxBatchSize,
"failureCallback);!
for (…) {!
loader.insertRow(handle, values);!
}!
loader.drain();!
loader.close();!
All	
  of	
  these	
  tools	
  are	
  built	
  on	
  a	
  MIT	
  licensed	
  extensible	
  API	
  that	
  
provides	
  performance	
  op6miza6ons,	
  batching,	
  load	
  balancing.	
  
page
© 2014 VoltDB PROPRIETARY
NATIVE CLIENT LIBRARIES
•  Java
•  C++
•  PHP
•  Node.js	
  
•  Go	
  
•  Python	
  
•  Erlang	
  
•  Ruby	
  
Or,	
  just…	
  
curl ‘http://localhost:8080/api/1.0/?!
Procedure=Vote&Parameters=[1,1,0]’!
	
  
page
© 2014 VoltDB PROPRIETARY
VOLTDB EXPORT TOPOLOGIES
•  VoltDB -> Queue (Kafka, RabbitMQ)
•  VoltDB -> HDFS (for Pig/Hive/etc. processing)
•  VoltDB -> OLAP (Vertica, Netezza..)
•  VoltDB -> HTTP Endpoint, i.e: ElasticSearch
38
page
© 2014 VoltDB PROPRIETARY
VOLTDB EXPORT PROGRAMMING CONTRACT
•  Export data is durable until exported
•  MPP scale-out of export data flows
•  At-least-once delivery during HA events
•  Built-in row ids (for uniqueness filtering)
•  Extensible API for open source connectors
39
page
© 2014 VoltDB PROPRIETARY
INTEGRATING VOLTDB WITH EXPORT TARGETS
40
•  Local	
  file	
  system	
  export	
  
•  JDBC	
  export	
  
•  Kaga	
  export	
  
•  RabbitMQ	
  export	
  
•  HDFS	
  export	
  
•  HTTP	
  export	
  
•  Extensible	
  API	
  
page
© 2014 VoltDB PROPRIETARY
EXPORTING TO LOCAL FILE SYSTEM
<export enabled="true" target="file">!
<configuration>!
<property name="type">csv</property>!
<property name="nonce">MyExport</property>!
</configuration>!
</export>!
page
© 2014 VoltDB PROPRIETARY
EXPORTING TO JDBC DESTINATIONS
<export enabled="true" target=”jdbc">!
<configuration>!
<property name=”jdbcurl">jdbc:postgresql://server/db</
property>!
<property name=”jdbcuser">guest</property>!
</configuration>!
</export>
page
© 2014 VoltDB PROPRIETARY
EXPORTING TO KAFKA
<export enabled="true" target=”kafka">!
<configuration>!
<property name=“metadata.broker.list”>server1</property>!
</configuration>!
</export>
page
© 2014 VoltDB PROPRIETARY
EXPORTING TO RABBITMQ
<export enabled="true" target="rabbitmq">!
<configuration>!
<property name="broker.host”>server1</property>!
</configuration>!
</export>

Building Fast Applications for Streaming Data

  • 1.
    page BUILDING FAST DATAAPPLICATIONS WITH STREAMING DATA Ryan Betts, CTO VoltDB 1
  • 2.
    page © 2014 VoltDBPROPRIETARY page AGENDA •  Fast Data Application Patterns •  Digging Deeper: Looking at the Data •  Streaming Approach •  DB Approach •  Summary 2
  • 3.
    page © 2014 VoltDBPROPRIETARY Collect   Explore   Analyze   Act   Data leads to applications Applications create more data 3
  • 4.
    page © 2014 VoltDBPROPRIETARY DATA ARCHITECTURE FOR FAST + BIG DATA Enterprise Apps ETL CR M ER P Etc . Data Lake (HDFS, etc.) BIG DATA SQL on Hadoop Map Reduce Exploratory Analytics BI Reporting Fast Operational Database FAST DATA Export Ingest / Interactive Streaming Analytics Fast Serve Analytics Decisioning 4
  • 5.
    page © 2014 VoltDBPROPRIETARY IN THE BIG CORNER Systems facilitating exploration and analytics of large data sets 5 Example  Technologies   Columnar  OLAP  warehouses   Hadoop  Ecosystem   •  MapReduce   •  Hive,  Pig   •  SQL.next:  Impala,  Drill,  Shark   Example  Applica7ons   •  User  segmentaHon  &  pre-­‐scoring   •  Seasonal  trending   •  RecommendaHon  matrix  calculaHons   •  Building  search  indexes   •  Data  Science:  staHsHcal  clustering,   Machine  learning  
  • 6.
    page © 2014 VoltDBPROPRIETARY IN THE FAST CORNER Systems facilitating real time ingest, analytics and decisions against incoming event feeds 6 Example  Technologies   •  Streaming  frameworks   •  VoltDB     Example  Applica7ons   •  Micro-­‐personalizaHon   •  RecommendaHon  serving   •  AlerHng/alarming   •  OperaHonal  monitoring   •  Data  enrichment  (ETL  eliminaHon)   •  High  throughput  authorizaHon   •  Ex:  API  quota  enforcement  
  • 7.
    page © 2014 VoltDBPROPRIETARY REAL TIME SCORING EXAMPLE OLAP / Hadoop User  segmenta6on  model   Calculated  on  Big  Side  and   cached  in  Fast  Side   Personaliza6on  requests   Score  based  responses   Game  play  events  and  scoring  decisions   exported  to  Big   7
  • 8.
    page FAST AND BIGIN COMBINATION •  Fast Profile •  In memory: user segmentation - GB to TB (300M+ rows) •  10k to 1M+ requests/sec •  99 percentile latency under 5ms. (5x9’s under 50ms) •  VoltDB export to Vertica •  Big Profile •  TB to PB of historical data •  Columnar analytics for fast reporting. •  Real time ingest of historical data (possibly via VoltDB) •  Vertica UDX to VoltDB
  • 9.
    page © 2014 VoltDBPROPRIETARY TYPICAL FAST QUESTIONS 9 Hadoop   SQL  OLAP   Fast   •  Is  the  fast  layer  streaming?   •  It  is  oSen  more  like  OLTP   •  How  do  the  pieces  communicate?   •  OLAP  analyHcs  from  Big  -­‐>  Fast   •  New  events  from  Fast  -­‐>  Big   •  Where  do  “analy7cs”  belong?   •  AnalyHcs  with  decisions:  with  Fast   •  AnalyHcs  against  history:  with  Big   •  Are  streaming  frameworks  equivalent?   •  TradiHonal  SQL  CEP  (Esper)   •  Tuple  DAGs  (Storm)   •  Window  processors  on  Hadoop  (Spark)    
  • 10.
    page © 2014 VoltDBPROPRIETARY THREE FAST DATA APPLICATION PATTERNS •  Real-Time Analytics •  Real-time analytics for operations •  Real-time KPI measurement •  Real-time analytics for apps •  Data Pipelines •  Streaming data enrichment •  Sessionization / re-assembly •  Correlation (by time, by location, by id) •  Filtering •  Pre-aggregation 10 •  Fast Request/Response •  Mobile Authorization •  Campaign Authorization •  Fast API Quota Enforcement •  Micro-Personalization •  Recommendation Serving
  • 11.
    page © 2014 VoltDBPROPRIETARY DATA FLOWS 11 Pipeline Data Lake HDFS/OLAP/Queue Ingest Real Time Analytics Export Using Real Time Data Request/ Response
  • 12.
    page © 2014 VoltDBPROPRIETARY THE INPUT FEED IS ONLY A SMALL PART OF FAST DATA Data Temporality Input Feed Click stream, tick stream, sensors, metrics Real-Time Analytic Results Event metadata Device version, location, user profiles, point of interest data OLAP Analytics Used in Real-Time Decisions Responses 12 Examples Event Stream Persistent/ Queryable Persistent (Look-Ups) Pipeline Output Persistent (Look-Ups) Event Stream Event Stream Counters, streaming aggregates, Time-series rollups Scoring models, seasonal usage, demographic trends Policy enforcement decisions, Personalization recommendations Enriched, filtered, correlated transform of input feed
  • 13.
    page © 2014 VoltDBPROPRIETARY THREE REQUIREMENTS CREATE STATE 1.  RT analytics outputs must be queryable 2.  Metadata, dimension data, “lookup tables” to create groupings for analytics and to supply enrichment data 3.  Grouping, filtering and aggregating generate intermediate state – open sessions, partially assembled logical events 13
  • 14.
    page © 2014 VoltDBPROPRIETARY STORM: A COMMON ALTERNATIVE •  Spouts and Bolts •  Streaming computation •  Run snippets of java against each event •  Connect queues to backends with intermediate code 14 But…   1.  Need  “lookup”  database  for  dimension  data.   2.  Need  a  “serving”  database  for  analyHc  results   3.  Need  addiHonal  management  clusters  (ZooKeeper)   4.  No  ad-­‐hoc  queries.   5.  Lots  of  custom  code  (rarely  declaraHve).  
  • 15.
    page © 2014 VoltDBPROPRIETARY STREAMING OPERATORS NEED STATE Require State •  Filter •  Join •  Aggregate •  Group By Stateless •  Partition 15
  • 16.
    page © 2014 VoltDBPROPRIETARY VOLTDB: REAL-TIME ANALYTICS 16 VoltDB Metadata   (Dimension  table)   Session  state   (Fact  table)   •  Operational analytics and monitoring •  RT analytics enabling user- facing applications •  KPI for internal BI/Dashboards •  In-memory MPP SQL over ODBC/JDBC •  Cheap + correct materialized views for streaming aggregations SQL,  Views   Ingest
  • 17.
    page © 2014 VoltDBPROPRIETARY VOLTDB: REQUEST/RESPONSE DECISIONS 17 •  Authorization •  RT balance checks, quota enforcement •  Personalization and Recommendation Serving •  Combine pre-score with immediate context •  Fully ACID transaction model. •  Thousands to Millions per second •  At less than 5ms latencies Metadata   (Dimension  table)   Session  state   (Fact  table)   ACID  TransacHons  
  • 18.
    page © 2014 VoltDBPROPRIETARY VOLTDB: DATA PIPELINES WITH EXPORT 18 VoltDB Metadata   (Dimension  table)   Session  state   (Fact  table)   •  Filtering (ex: only RFID / iBeacon readings that show change from previous location). •  Sessionization •  Common version re-writing •  Data enrichment •  MPP streaming Export •  Row data, Thrift messages, CSV •  OLAP, HDFS and message queues Export
  • 19.
    page © 2014 VoltDBPROPRIETARY PIPELINE DEPLOYED: VOLTDB… 19 Manages  game  state  for  online  poker  and  archives  completed  games  to  Hadoop.     Ingests  smart  meter  readings  from  concentrators,  supports  real  7me  applica7ons  and   buffers  data  for  end  of  day  billing  media7on  systems.    
  • 20.
    page © 2014 VoltDBPROPRIETARY PIPELINE DEPLOYED: VOLTDB… 20 Ingests  RFID  readings,  supports  real  Hme  applicaHons  that  push  social  media  updates  based   on  VoltDB  leaderboards.     Processes  clickstream  logs  and  exports  correlated  USERID  records  for  use  at  CDN  endpoints   for  adverHsing  targeHng     Processes  SKU  catalogs  from  suppliers  to  produce  correlated  catalog  that  is  exported  to   indexing  and  post-­‐processing  for  an  online  retailer.  
  • 21.
    page © 2014 VoltDBPROPRIETARY VOLTDB EXPORT ANSWERS THE QUESTIONS: How do I stream filtered, enriched, updated results to OLAP/HDFS systems? 21 How do I send alerts, alarms, SMS, or messages to downstream applications?
  • 22.
    page © 2014 VoltDBPROPRIETARY VOLTDB EXPORT UI CREATE TABLE events (! EventID INTEGER,! time TIMESTAMP,! msg VARCHAR(128));! EXPORT TABLE events;! 22 <export enabled="true" target="file">! ddl.sql! deployment.xml!INSERT into TABLE values…! Application SQL!
  • 23.
    page © 2014 VoltDBPROPRIETARY EXPORTING TO HDFS <export enabled="true" target="http">! <configuration>! <property name="endpoint">! http://hadoopserver/webhdfs/v1.0/%t/%p.%t.%g.csv! </property>! </configuration>! </export>
  • 24.
    page © 2014 VoltDBPROPRIETARY EXPORT FORMATS •  CSV •  TSV •  Avro container •  Raw data
  • 25.
    page © 2014 VoltDBPROPRIETARY EXTENSIBLE API 25 public void onBlockStart() throws RestartBlockException;{! }! ! public boolean processRow(int rowSize, byte[] rowData) throws "RestartBlockException {! }! ! public void onBlockCompletion() throws RestartBlockException {! }! All  of  these  export  connectors  are  hosted  plugins  to  the  VoltDB   database.  VoltDB  manages  HA,  fault  tolerance,  configura6on,   and  MPP  scale-­‐out.  
  • 26.
    page © 2014 VoltDBPROPRIETARY DATA ARCHITECTURE FOR FAST + BIG DATA Enterprise Apps ETL CR M ER P Etc . Data Lake (HDFS, etc.) BIG DATA SQL on Hadoop Map Reduce Exploratory Analytics BI Reporting Fast Operational Database FAST DATA Export Ingest / Interactive Streaming Analytics Fast Serve Analytics Decisioning 26
  • 27.
    page © 2014 VoltDBPROPRIETARY 27
  • 28.
    page © 2014 VoltDBPROPRIETARY page THANK YOU! 28
  • 29.
    page © 2014 VoltDBPROPRIETARY VOLTDB: 29 •  We  say  “Ingest  &Export”  vs.  Spout  and  Bolt   •  Scale  (ACID)  snippets  of  Java  for  each   incoming  event.     AND…   •  Actually  serve  real  7me  analy7cs  via  SQL   •  Metadata  for  lookup/enrichment  implicit  in  DB  func7on   •  Integrate  with  OLAP  systems  to  use  OLAP  reports  with  event-­‐ based  processing   •  Generate  fast  transac7onal  responses   •  Support  ad-­‐hoc  queryability   •  Declara7ve  aggrega7ons  vs.  code   •  Fast:  no  need  to  micro-­‐batch  
  • 30.
    page © 2014 VoltDBPROPRIETARY VOLTDB: DATA PIPELINES WITH EXPORT 30 VoltDB Metadata   (Dimension  table)   Session  state   (Fact  table)   •  Filtering (ex: only RFID / iBeacon readings that show change from previous location). •  Sessionization •  Common version re-writing •  Data enrichment •  MPP streaming Export •  Row data, Thrift messages, CSV •  OLAP, HDFS and message queues Export
  • 31.
    page © 2014 VoltDBPROPRIETARY INTEGRATING DATA SOURCES WITH VOLTDB •  CSV  loader   •  Kaga  loader   •  JDBC  loader   •  Vertica UDx   •  Extensible  loader  API   •  JDBC   •  ODBC   •  HTTP  JSON   •  NaHve  client  drivers  /  SDKs   BULK  LOADERS   APPLICATION  INTERFACES  
  • 32.
    page © 2014 VoltDBPROPRIETARY INTEGRATING WITH CSV DATA csvloader volttable -f data.csv!
  • 33.
    page © 2014 VoltDBPROPRIETARY INTEGRATING WITH KAFKA kafkaloader volttable ! --zookeeper=zkserver:2181 ! --topic=topicname!
  • 34.
    page © 2014 VoltDBPROPRIETARY INTEGRATING WITH JDBC SOURCES jdbcloader volttable ! --jdbcurl=jdbc:postgresql://server/db ! --jdbcdriver=org.postgresql.Driver ! --jdbctable=table
  • 35.
    page © 2014 VoltDBPROPRIETARY INTEGRATING WITH HP VERTICA UDX SELECT voltdbload(c1, c2, c3! USING PARAMETERS voltservers='localhost’,! volttable=‘volttable’)! FROM T;!
  • 36.
    page © 2014 VoltDBPROPRIETARY EXTENSIBLE VOLTBULKLOADER: BULK SMASH 36 VoltBulkLoader loader =! "client.getNewBulkLoader(tableName, maxBatchSize, "failureCallback);! for (…) {! loader.insertRow(handle, values);! }! loader.drain();! loader.close();! All  of  these  tools  are  built  on  a  MIT  licensed  extensible  API  that   provides  performance  op6miza6ons,  batching,  load  balancing.  
  • 37.
    page © 2014 VoltDBPROPRIETARY NATIVE CLIENT LIBRARIES •  Java •  C++ •  PHP •  Node.js   •  Go   •  Python   •  Erlang   •  Ruby   Or,  just…   curl ‘http://localhost:8080/api/1.0/?! Procedure=Vote&Parameters=[1,1,0]’!  
  • 38.
    page © 2014 VoltDBPROPRIETARY VOLTDB EXPORT TOPOLOGIES •  VoltDB -> Queue (Kafka, RabbitMQ) •  VoltDB -> HDFS (for Pig/Hive/etc. processing) •  VoltDB -> OLAP (Vertica, Netezza..) •  VoltDB -> HTTP Endpoint, i.e: ElasticSearch 38
  • 39.
    page © 2014 VoltDBPROPRIETARY VOLTDB EXPORT PROGRAMMING CONTRACT •  Export data is durable until exported •  MPP scale-out of export data flows •  At-least-once delivery during HA events •  Built-in row ids (for uniqueness filtering) •  Extensible API for open source connectors 39
  • 40.
    page © 2014 VoltDBPROPRIETARY INTEGRATING VOLTDB WITH EXPORT TARGETS 40 •  Local  file  system  export   •  JDBC  export   •  Kaga  export   •  RabbitMQ  export   •  HDFS  export   •  HTTP  export   •  Extensible  API  
  • 41.
    page © 2014 VoltDBPROPRIETARY EXPORTING TO LOCAL FILE SYSTEM <export enabled="true" target="file">! <configuration>! <property name="type">csv</property>! <property name="nonce">MyExport</property>! </configuration>! </export>!
  • 42.
    page © 2014 VoltDBPROPRIETARY EXPORTING TO JDBC DESTINATIONS <export enabled="true" target=”jdbc">! <configuration>! <property name=”jdbcurl">jdbc:postgresql://server/db</ property>! <property name=”jdbcuser">guest</property>! </configuration>! </export>
  • 43.
    page © 2014 VoltDBPROPRIETARY EXPORTING TO KAFKA <export enabled="true" target=”kafka">! <configuration>! <property name=“metadata.broker.list”>server1</property>! </configuration>! </export>
  • 44.
    page © 2014 VoltDBPROPRIETARY EXPORTING TO RABBITMQ <export enabled="true" target="rabbitmq">! <configuration>! <property name="broker.host”>server1</property>! </configuration>! </export>