page
HOW TO BUILD REAL-TIME STREAMING
ANALYTICS WITH AN IN-MEMORY, SCALE-OUT
SQL DATABASE
Ryan Betts, CTO
VoltDB
1
page© 2015 VoltDB PROPRIETARY
OUR SPEAKER
Ryan Betts
CTO at VoltDB
2
page
© 2015 VoltDB PROPRIETARY
page
AGENDA
• Setup: Fast vs. Big
• Fast data application requirements
• The role of analytics
• Concrete examples
3
page© 2015 VoltDB PROPRIETARY
Collect Explore
AnalyzeAct
4
Big Data analytic results:
1. Discoveries: seasonal predictions,
scientific results, long-term
capacity planning
1. Optimizations: market
segmentation, fraud heuristics,
optimal customer journey
page© 2015 VoltDB PROPRIETARYEnterprise Apps
ETL
CRM ERP Etc.
Data Lake (HDFS)
BIG DATA
Non Relational
Processing
BI Reporting
Fast Operational
Database
FAST DATA
Export
Ingest /
Interactive
Real-time
Analytics
Fast Serve
Analytics
Decisioning
Data Warehouse
Columnar
Analytics OLAP
DATA ARCHITECTURE FOR FAST + BIG DATA
page© 2015 VoltDB PROPRIETARY 6
Fast (in motion)
Streaming Analytics:
real time summary and
aggregation
Transaction Processing:
per-event decisions using
context + history
Big (at rest)
Exploration:
data science, investigation of
large data sets
Reporting:
recommendation matrices,
search indexes, trend and BI
page© 2015 VoltDB PROPRIETARY
MODERN OLTP
7
1. Processing streams requires integrated access to state.
2. Using real time analytics requires a query interface.
3. Reacting to incoming events requires transactions.
State + Query + Transactions = OLTP
Fast
Streaming Analytics
Transaction Processing
page© 2015 VoltDB PROPRIETARY 8
Continuous Query Transactions Transformations
• Materialized Views
• Capped Tables
• Ranking Indexes
• Per-event Java +
SQL
• ACID processing
• Millisecond
latency responses
• Loaders/Importers
• Export Connectors
• State for
sessionization,
enrichment
VoltDB Architecture
Commodity HW HA + ACID Scale-out VM-friendly
page© 2015 VoltDB PROPRIETARY
MATERIALIZED VIEWS
• Declarative SQL
• Fully transactional
• Supports ad-hoc query
9
CREATE VIEW registrations_by_zipcode (
zipcode, registered_voters
) AS
SELECT zipcode, count(*) from voters
where registration=1 GROUP BY zipcode;
page© 2015 VoltDB PROPRIETARY
MV FOR STREAMING AGGREGATION
• Partitioned on cluster
• Immediately up-to-date
• Active/active HA
10
Global Read: SELECT
sum(count) WHERE sec > 130
and sec < 140;
page© 2015 VoltDB PROPRIETARY
MATERIALIZED VIEWS WITH ACID TRANSACTIONS
• Can be queried as part of a
transaction
• Example: fast quota
enforcement
11
1-partition throughput (transactions/second)
10GB of data being aggregated.
page© 2015 VoltDB PROPRIETARY
CAPPED COLLECTIONS
• Simple windows
• Durable, queryable
• Support Mat. Views
12
page© 2015 VoltDB PROPRIETARY
RANKING INDEXES FOR LEADERBOARDS
• Sorted indexes are ordered
statistic trees for O(log(n))
ranking
• Quickly find overall rank
• Quickly count items in range
13
SELECT COUNT(*) FROM scores WHERE score > 281;
SELECT COUNT(*) FROM scores WHERE score >= 10 AND score <= 200;
page© 2015 VoltDB PROPRIETARY
SQL SUPPORT
14
http://downloads.voltdb.com/documentation/TriFoldDevQuickRef.pdf
• ALTER TABLE|CONSTRAINT|COLUMN|PROCEDURE
• UNIQUE, MULTI-KEY INDEXES
• INDEXES ON COLUMN FUNCTIONS
• SQL ONLY DDL STORED PROCEDURES
• JAVA STORED PROCEDURES
• AUTO-GENERATED CRUD COMMANDS + REST API
• MATERIALIZED VIEWS
• SUBQUERY, UPSERT|INTO, JOIN, SELF-JOIN, INSERT SELECT
• ~60 COLUMN FUNCTIONS
page© 2015 VoltDB PROPRIETARY
COMBINED JAVA + SQL
• Logic + SQL
• 3rd party code
15
VoltDB architecture
Commodity HW HA + ACID Scale-out VM-friendly
page© 2015 VoltDB PROPRIETARY
ACID PROCESSING
• Sync intra-cluster replication
• Replicated durability
• High availability (configurable)
• Serializable isolation
• Atomic ad-hoc or stored procedures
• Partitioned & distributed txns
• Load balanced reads across replicas
16
page© 2015 VoltDB PROPRIETARY
ACID MATTERS
• Speed of development
• Richness of application
• Obvious for billing, policy enforcement, authorization
• Equally necessary for aggregation
• Update in place desirable vs. batch process for ingest
17
page© 2015 VoltDB PROPRIETARY
Performance – millisecond per-event responses
SoftLayer: Update and Read Latency
Latency(ms)
Throughput (ops/sec)
SoftLayer
AWS
YCSB Workload B – SoftLayer vs AWS
page© 2015 VoltDB PROPRIETARY
INTEGRATING DATA SOURCES WITH VOLTDB
• CSV loader
• Kafka loader
• JDBC loader
• Vertica UDx
• Extensible loader API
• JDBC
• ODBC
• HTTP JSON
• Native client drivers / SDKs
BULK LOADERS APPLICATION INTERFACES
page© 2015 VoltDB PROPRIETARY
VOLTDB EXPORT UI
CREATE TABLE events (
EventID INTEGER,
time TIMESTAMP,
msg VARCHAR(128));
EXPORT TABLE events;
20
<export enabled="true" target="file">
ddl.sql
deployment.xmlINSERT into TABLE values…
Application SQL
page© 2015 VoltDB PROPRIETARY
INTEGRATING VOLTDB WITH EXPORT TARGETS
21
• Local file system export
• JDBC export
• Kafka export
• RabbitMQ export
• HDFS export
• HTTP export
• Extensible API
page© 2015 VoltDB PROPRIETARY
EXTENSIBLE OPEN SOURCE API
22
public void onBlockStart() throws RestartBlockException;{
}
public boolean processRow(int rowSize, byte[] rowData) throws RestartBlockException {
}
public void onBlockCompletion() throws RestartBlockException {
}
VoltDB architecture
Commodity HW HA + ACID Scale-out VM-friendly
page© 2015 VoltDB PROPRIETARY
REVIEW
Application
Event
Sources
VoltDB
Client
Interface
Partition
Replica 1
Partition
Replica 2
Export
Destination
(OLAP,
HTTP)
• SQL + Java transactions
• JSON column values
• HA in-memory processing
• ACID (durable to disk)
• Ranking indexes
• Indexes on functions
• Capped tables
• Mat. views: RT aggregation
• Append only export
• 1-5 ms @ 99% responses
page© 2015 VoltDB PROPRIETARY
BIGGER PICTURE
24
page© 2015 VoltDB PROPRIETARY 25
page© 2015 VoltDB PROPRIETARY
QUESTIONS?
• Use the chat window to type in your questions
• Try VoltDB yourself:
 Download the Enterprise Edition:
• www.voltdb.com/download
 Check out our Sample Apps:
• www.voltdb.com/community/applications
 Open source version is available on github.com
26
page© 2015 VoltDB PROPRIETARY page
THANK YOU!
27

How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL Database

  • 1.
    page HOW TO BUILDREAL-TIME STREAMING ANALYTICS WITH AN IN-MEMORY, SCALE-OUT SQL DATABASE Ryan Betts, CTO VoltDB 1
  • 2.
    page© 2015 VoltDBPROPRIETARY OUR SPEAKER Ryan Betts CTO at VoltDB 2
  • 3.
    page © 2015 VoltDBPROPRIETARY page AGENDA • Setup: Fast vs. Big • Fast data application requirements • The role of analytics • Concrete examples 3
  • 4.
    page© 2015 VoltDBPROPRIETARY Collect Explore AnalyzeAct 4 Big Data analytic results: 1. Discoveries: seasonal predictions, scientific results, long-term capacity planning 1. Optimizations: market segmentation, fraud heuristics, optimal customer journey
  • 5.
    page© 2015 VoltDBPROPRIETARYEnterprise Apps ETL CRM ERP Etc. Data Lake (HDFS) BIG DATA Non Relational Processing BI Reporting Fast Operational Database FAST DATA Export Ingest / Interactive Real-time Analytics Fast Serve Analytics Decisioning Data Warehouse Columnar Analytics OLAP DATA ARCHITECTURE FOR FAST + BIG DATA
  • 6.
    page© 2015 VoltDBPROPRIETARY 6 Fast (in motion) Streaming Analytics: real time summary and aggregation Transaction Processing: per-event decisions using context + history Big (at rest) Exploration: data science, investigation of large data sets Reporting: recommendation matrices, search indexes, trend and BI
  • 7.
    page© 2015 VoltDBPROPRIETARY MODERN OLTP 7 1. Processing streams requires integrated access to state. 2. Using real time analytics requires a query interface. 3. Reacting to incoming events requires transactions. State + Query + Transactions = OLTP Fast Streaming Analytics Transaction Processing
  • 8.
    page© 2015 VoltDBPROPRIETARY 8 Continuous Query Transactions Transformations • Materialized Views • Capped Tables • Ranking Indexes • Per-event Java + SQL • ACID processing • Millisecond latency responses • Loaders/Importers • Export Connectors • State for sessionization, enrichment VoltDB Architecture Commodity HW HA + ACID Scale-out VM-friendly
  • 9.
    page© 2015 VoltDBPROPRIETARY MATERIALIZED VIEWS • Declarative SQL • Fully transactional • Supports ad-hoc query 9 CREATE VIEW registrations_by_zipcode ( zipcode, registered_voters ) AS SELECT zipcode, count(*) from voters where registration=1 GROUP BY zipcode;
  • 10.
    page© 2015 VoltDBPROPRIETARY MV FOR STREAMING AGGREGATION • Partitioned on cluster • Immediately up-to-date • Active/active HA 10 Global Read: SELECT sum(count) WHERE sec > 130 and sec < 140;
  • 11.
    page© 2015 VoltDBPROPRIETARY MATERIALIZED VIEWS WITH ACID TRANSACTIONS • Can be queried as part of a transaction • Example: fast quota enforcement 11 1-partition throughput (transactions/second) 10GB of data being aggregated.
  • 12.
    page© 2015 VoltDBPROPRIETARY CAPPED COLLECTIONS • Simple windows • Durable, queryable • Support Mat. Views 12
  • 13.
    page© 2015 VoltDBPROPRIETARY RANKING INDEXES FOR LEADERBOARDS • Sorted indexes are ordered statistic trees for O(log(n)) ranking • Quickly find overall rank • Quickly count items in range 13 SELECT COUNT(*) FROM scores WHERE score > 281; SELECT COUNT(*) FROM scores WHERE score >= 10 AND score <= 200;
  • 14.
    page© 2015 VoltDBPROPRIETARY SQL SUPPORT 14 http://downloads.voltdb.com/documentation/TriFoldDevQuickRef.pdf • ALTER TABLE|CONSTRAINT|COLUMN|PROCEDURE • UNIQUE, MULTI-KEY INDEXES • INDEXES ON COLUMN FUNCTIONS • SQL ONLY DDL STORED PROCEDURES • JAVA STORED PROCEDURES • AUTO-GENERATED CRUD COMMANDS + REST API • MATERIALIZED VIEWS • SUBQUERY, UPSERT|INTO, JOIN, SELF-JOIN, INSERT SELECT • ~60 COLUMN FUNCTIONS
  • 15.
    page© 2015 VoltDBPROPRIETARY COMBINED JAVA + SQL • Logic + SQL • 3rd party code 15 VoltDB architecture Commodity HW HA + ACID Scale-out VM-friendly
  • 16.
    page© 2015 VoltDBPROPRIETARY ACID PROCESSING • Sync intra-cluster replication • Replicated durability • High availability (configurable) • Serializable isolation • Atomic ad-hoc or stored procedures • Partitioned & distributed txns • Load balanced reads across replicas 16
  • 17.
    page© 2015 VoltDBPROPRIETARY ACID MATTERS • Speed of development • Richness of application • Obvious for billing, policy enforcement, authorization • Equally necessary for aggregation • Update in place desirable vs. batch process for ingest 17
  • 18.
    page© 2015 VoltDBPROPRIETARY Performance – millisecond per-event responses SoftLayer: Update and Read Latency Latency(ms) Throughput (ops/sec) SoftLayer AWS YCSB Workload B – SoftLayer vs AWS
  • 19.
    page© 2015 VoltDBPROPRIETARY INTEGRATING DATA SOURCES WITH VOLTDB • CSV loader • Kafka loader • JDBC loader • Vertica UDx • Extensible loader API • JDBC • ODBC • HTTP JSON • Native client drivers / SDKs BULK LOADERS APPLICATION INTERFACES
  • 20.
    page© 2015 VoltDBPROPRIETARY VOLTDB EXPORT UI CREATE TABLE events ( EventID INTEGER, time TIMESTAMP, msg VARCHAR(128)); EXPORT TABLE events; 20 <export enabled="true" target="file"> ddl.sql deployment.xmlINSERT into TABLE values… Application SQL
  • 21.
    page© 2015 VoltDBPROPRIETARY INTEGRATING VOLTDB WITH EXPORT TARGETS 21 • Local file system export • JDBC export • Kafka export • RabbitMQ export • HDFS export • HTTP export • Extensible API
  • 22.
    page© 2015 VoltDBPROPRIETARY EXTENSIBLE OPEN SOURCE API 22 public void onBlockStart() throws RestartBlockException;{ } public boolean processRow(int rowSize, byte[] rowData) throws RestartBlockException { } public void onBlockCompletion() throws RestartBlockException { } VoltDB architecture Commodity HW HA + ACID Scale-out VM-friendly
  • 23.
    page© 2015 VoltDBPROPRIETARY REVIEW Application Event Sources VoltDB Client Interface Partition Replica 1 Partition Replica 2 Export Destination (OLAP, HTTP) • SQL + Java transactions • JSON column values • HA in-memory processing • ACID (durable to disk) • Ranking indexes • Indexes on functions • Capped tables • Mat. views: RT aggregation • Append only export • 1-5 ms @ 99% responses
  • 24.
    page© 2015 VoltDBPROPRIETARY BIGGER PICTURE 24
  • 25.
    page© 2015 VoltDBPROPRIETARY 25
  • 26.
    page© 2015 VoltDBPROPRIETARY QUESTIONS? • Use the chat window to type in your questions • Try VoltDB yourself:  Download the Enterprise Edition: • www.voltdb.com/download  Check out our Sample Apps: • www.voltdb.com/community/applications  Open source version is available on github.com 26
  • 27.
    page© 2015 VoltDBPROPRIETARY page THANK YOU! 27