VoltDB and Hortonworks Present:
Powering Fast Data and the Hadoop Ecosystem
with the New VoltDB V5.0
January 28, 2015
© 2015 VoltDB PROPRIETARY
OUR SPEAKERS
2
John Piekos,
VP Engineering,
VoltDB
Ajay Singh,
Director,
Technical Channels,
Hortonworks
John Hugg,
Software Engineer,
VoltDB
© 2015 VoltDB PROPRIETARY
AGENDA
• Overview
• What’s New in v5.0?
• VoltDB and Hadoop
• VoltDB Real-time Analytics Demo
• Q&A
3
© 2015 VoltDB PROPRIETARY
FAST DATA SOURCES AND DRIVERS
Mobile
IoT
Social
Sensors
Logs
Data is doubling every two years
• 26 billion connected devices by
2020 (Gartner 2014)
• 37% of most data will be
processed at the edge in
milliseconds (Cisco IoT Study 12/11/14)
Mobile
IoT
4
© 2015 VoltDB PROPRIETARY
PREDICTION
5
All businesses will compete on a new dimension – the ability to
make decisions “in the moment” on Fast Data.
© 2015 VoltDB PROPRIETARY
PROBLEM #1
6
Companies are not tapping the inherent value in fast data because
it’s too difficult/expensive.
We make it simple, easy.
© 2015 VoltDB PROPRIETARY
PROBLEM #2
7
Companies hack together a bunch of different products that each
sort of do different things, tinker with it, and only realize a small part
of the opportunity.
We’re a single, integrated platform.
© 2015 VoltDB PROPRIETARY
SOLUTION
8
VoltDB is a purpose-built database platform with the performance,
scale and capability to ingest, analyze and make decisions on fast
data in real time.
© 2015 VoltDB PROPRIETARY
VOLTDB AND FAST DATA
9
© 2015 VoltDB PROPRIETARY
 In-Memory performance
 Scale-out, shared nothing
 Reliability and fault tolerance
 Real-time analytics
 ACID & SQL & Java
 Hadoop integration
10
VOLTDB: A MODERN ARCHITECTURE FOR
FAST DATA
WHAT’S NEW IN
VOLTDB V5.0
John Piekos
VP of Engineering
© 2015 VoltDB PROPRIETARY
WHAT’S NEW IN VOLTDB V5.0?
• Fast Data Integrations
• Fast Data Pipeline Sample Applications
• More SQL. SQL-92.
• Ease of Database Development (traditional API)
• VoltDB Management Center (VMC)
12
© 2015 VoltDB PROPRIETARY
VOLTDB AND THE FAST DATA PIPELINE
13
© 2015 VoltDB PROPRIETARY
FAST DATA INTEGRATIONS - IMPORTERS
• Kafka Loader
• Subscribe to a Kafka topic and insert each message into a VoltDB
Table
• JDBC Loader
• Load a JDBC result set into a VoltDB Table
• Vertica Udx
• User-defined function to load Vertica result sets into a VoltDB
Table
• Apache Hive and Apache Pig
• Hadoop OutputFormat to load Hive and Pig result sets into VoltDB
14
© 2015 VoltDB PROPRIETARY
FAST DATA INTEGRATIONS - EXPORTERS
• HDFS Export
• Hadoop export via WebHDFS and HttpFS
• HTTP Export
• Delivery and Alerting via HTTP post/get
• Kafka Export, RabbitMQ Export
• Message queue delivery
• Export format configurable
• Avro, CSV, TSV, more coming…
15
© 2015 VoltDB PROPRIETARY
FAST DATA PIPELINE SAMPLE APPLICATION
• Streaming Data, Real-time Analytics
• Export to Hadoop
• Export to OLAP (Vertica, others)
• Place historical decisioning intelligence into VoltDB
• Closed Loop, via Hive, Pig OutputFormat or Vertica Udx
• Download: https://github.com/VoltDB/app-fastdata
• And see our blog posts:
http://voltdb.com/blog/fast-data-look-voltdb-sample-app
16
© 2015 VoltDB PROPRIETARY
LAMBDA ARCHITECTURE SAMPLE APPLICATION
• Simplified Lambda Architecture “Speed Layer”
• Real-Time Analytics
• Serving Layer
• Demonstration at the end of this presentation
17
© 2015 VoltDB PROPRIETARY
SQL
• SQL Subquery
• INSERT INTO … SELECT
• UPSERT
• More JSON
• SET_FIELD() column function
• Shortcut field/path notation
• Query Timeout
• Enhanced Capped Collections
18
© 2015 VoltDB PROPRIETARY
SQL CAPPED COLLECTIONS
CREATE TABLE EVENTS (
WHEN_OCCURRED TIMESTAMP,
INFO VARCHAR (256),
LIMIT PARTITION ROWS 100
EXECUTE (
DELETE FROM EVENTS
ORDER BY WHEN_OCCURRED, INFO
LIMIT 1
)
);
When querying, facilitates Real-Time Analytics over a “time window” of data.
19
© 2015 VoltDB PROPRIETARY
TRADITIONAL DEVELOPMENT MODEL
• Start an empty database
• Catalog no longer needed!
$ sqlcmd
SQL Command :: localhost:21212
1> CREATE TABLE contestants
2> (
3> contestant_number integer NOT NULL
4> , contestant_name varchar(50) NOT NULL
5> , CONSTRAINT PK_contestants PRIMARY KEY
6> (
7> contestant_number
8> )
9> );
20
© 2015 VoltDB PROPRIETARY
TRADITIONAL DEVELOPMENT MODEL (CONT)
• ALTER TABLE
• CREATE INDEX/PROCEDURE/ROLE/TABLE/VIEW
• DROP INDEX/PROCEDURE/TABLE/VIEW
• JDBC and ODBC driver
21
© 2015 VoltDB PROPRIETARY
VOLTDB MANAGEMENT CENTER (VMC)
22
© 2015 VoltDB PROPRIETARY
DOWNLOAD V5.0 TODAY
• VoltDB Community Edition
• Open Source, available on github.com/voltdb
• VoltDB Enterprise Edition
• Production-ready
• Fully durable, highly available
• voltdb.com/download/software
VoltDB runs over 6 BILLION transactions/day in production!
23
HORTONWORKS
Ajay Singh
Hortonworks
Director of Technical Channels
Page25 Hortonworks Confidential 2014
BIG DATA : LAMBDA ARCHITECTURE
Key Tenants of Lamda Architecture
 Batch Layer
 Manages master data
 Immutable, append-only set of raw data
 Cleanse, Normalize & Pre-Compute
Batch Views
 Advanced Statistical Calculations
 Speed layer
 Real Time Event Stream Processing
 Computes Real-Time Views
 Serving Layer
 Low-latency, ad-hoc query
 Reporting, BI & Dashboard
New Data
Stream
All Data
(HDFS)
Pre-Compute
Views & Deep
Analytics
Process
Streams
Incremental
Views
Business
View
Business
View
Query
SPEED LAYER
BATCH LAYER
SERVING LAYER
Page26 Hortonworks Confidential 2014
BIG DATA WITH HDP & VOLTDB
Page27 Hortonworks Confidential 2014
HDP DELIVERS A COMPREHENSIVE DATA MANAGEMENT PLATFORM
Hortonworks Data Platform 2.2
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
TezTez
Java
Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV
Engines
HDFS
(Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase
Accumulo
Slider Slider
SECURITYGOVERNANCE OPERATIONSBATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
Kafka
NFS
WebHDFS
Authentication
Authorization
Accounting
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
Cluster: Ranger
Deployment ChoiceLinux Windows On-Premises Cloud
YARN
is the architectural
center of HDP
Enables batch, interactive
and real-time workloads
Provides comprehensive
enterprise capabilities
The widest range of
deployment options
Delivered Completely in the OPEN
VOLTDB FAST DATA
DEMO
John Hugg
VoltDB Founding Engineering
© 2015 VoltDB PROPRIETARY
The Lambda Architecture
29
© 2015 VoltDB PROPRIETARY
LAMBDA OVERVIEW
• Batch processing is well understood and robust.
Latency is pretty horrific.
• Stream processing is immediate.
Complex and not as robust to hardware or user failure.
• Lambda Architecture says do both in parallel to
compensate.
Speed Layer & Batch Layer
30
© 2015 VoltDB PROPRIETARY
EXAMPLE LAMBDA STACK
Speed Layer
Batch Layer
31
© 2015 VoltDB PROPRIETARY
EXAMPLE PROBLEM
32
© 2015 VoltDB PROPRIETARY
HOW MANY
PEOPLE
USED MY APP
TODAY?
33
© 2015 VoltDB PROPRIETARY
HOW MANY
UNIQUE
USERS
INTERACTED
WITH MY APP
TODAY?
34
© 2015 VoltDB PROPRIETARY
Open Cupcake Time
App Identifier
Unique Device ID
appid = 87
deviceid = 12
35
© 2015 VoltDB PROPRIETARY
Open Cupcake Time
App Identifier
Unique Device ID
appid = 87
deviceid = 12
The Lambda Architecture
36
© 2015 VoltDB PROPRIETARY
1 MILLION
APPID,DEVICEID
PAIRS PER SECOND
37
© 2015 VoltDB PROPRIETARY
Enter HyperLogLog
A method of estimating cardinality.
blob = update(integer, blob)
integer = estimate(blob)
Fixed blob size.
A few kilobytes to get 99% accuracy.
38
© 2015 VoltDB PROPRIETARY
Open Cupcake Time
App Identifier
Unique Device ID
appid = 87
deviceid = 12
39
© 2015 VoltDB PROPRIETARY
Open Cupcake Time
App Identifier
Unique Device ID
appid = 87
deviceid = 12
40
© 2015 VoltDB PROPRIETARY
DECLARE SQL STATEMENTS
41
© 2015 VoltDB PROPRIETARY
PARAMS ARE APP ID & DEVICE ID
42
© 2015 VoltDB PROPRIETARY
GET ROW FOR THIS APP ID FROM STATE
43
© 2015 VoltDB PROPRIETARY
CREATE A HYPERLOGLOG STRUCTURE FROM THE ROW
OR CREATE A NEW HLL IF NO ROW
44
© 2015 VoltDB PROPRIETARY
ADD THIS UNIQUE ID TO THE HLL STRUCTURE
45
© 2015 VoltDB PROPRIETARY
UPDATE ROW WITH NEW HLL BYTES AND THE COMPUTED
ESTIMATE
46
© 2015 VoltDB PROPRIETARY
ADVANTAGES
47
© 2015 VoltDB PROPRIETARY
LESS
COMPLEX
OPERATIONALLY
v
s
.
48
© 2015 VoltDB PROPRIETARY
LESS CODE IN FEWER PLACES
• HyperLogLog code is used entirely
within one stored procedure.
• Client uses SQL + simple schema for
queries & reporting.
Less
Complex
Development
SELECT appid, devicecount
FROM estimates
ORDER BY devicecount DESC
LIMIT 10;
49
© 2015 VoltDB PROPRIETARY
DEMO
50
© 2015 VoltDB PROPRIETARY
THANK YOU!
51

Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks

  • 1.
    VoltDB and HortonworksPresent: Powering Fast Data and the Hadoop Ecosystem with the New VoltDB V5.0 January 28, 2015
  • 2.
    © 2015 VoltDBPROPRIETARY OUR SPEAKERS 2 John Piekos, VP Engineering, VoltDB Ajay Singh, Director, Technical Channels, Hortonworks John Hugg, Software Engineer, VoltDB
  • 3.
    © 2015 VoltDBPROPRIETARY AGENDA • Overview • What’s New in v5.0? • VoltDB and Hadoop • VoltDB Real-time Analytics Demo • Q&A 3
  • 4.
    © 2015 VoltDBPROPRIETARY FAST DATA SOURCES AND DRIVERS Mobile IoT Social Sensors Logs Data is doubling every two years • 26 billion connected devices by 2020 (Gartner 2014) • 37% of most data will be processed at the edge in milliseconds (Cisco IoT Study 12/11/14) Mobile IoT 4
  • 5.
    © 2015 VoltDBPROPRIETARY PREDICTION 5 All businesses will compete on a new dimension – the ability to make decisions “in the moment” on Fast Data.
  • 6.
    © 2015 VoltDBPROPRIETARY PROBLEM #1 6 Companies are not tapping the inherent value in fast data because it’s too difficult/expensive. We make it simple, easy.
  • 7.
    © 2015 VoltDBPROPRIETARY PROBLEM #2 7 Companies hack together a bunch of different products that each sort of do different things, tinker with it, and only realize a small part of the opportunity. We’re a single, integrated platform.
  • 8.
    © 2015 VoltDBPROPRIETARY SOLUTION 8 VoltDB is a purpose-built database platform with the performance, scale and capability to ingest, analyze and make decisions on fast data in real time.
  • 9.
    © 2015 VoltDBPROPRIETARY VOLTDB AND FAST DATA 9
  • 10.
    © 2015 VoltDBPROPRIETARY  In-Memory performance  Scale-out, shared nothing  Reliability and fault tolerance  Real-time analytics  ACID & SQL & Java  Hadoop integration 10 VOLTDB: A MODERN ARCHITECTURE FOR FAST DATA
  • 11.
    WHAT’S NEW IN VOLTDBV5.0 John Piekos VP of Engineering
  • 12.
    © 2015 VoltDBPROPRIETARY WHAT’S NEW IN VOLTDB V5.0? • Fast Data Integrations • Fast Data Pipeline Sample Applications • More SQL. SQL-92. • Ease of Database Development (traditional API) • VoltDB Management Center (VMC) 12
  • 13.
    © 2015 VoltDBPROPRIETARY VOLTDB AND THE FAST DATA PIPELINE 13
  • 14.
    © 2015 VoltDBPROPRIETARY FAST DATA INTEGRATIONS - IMPORTERS • Kafka Loader • Subscribe to a Kafka topic and insert each message into a VoltDB Table • JDBC Loader • Load a JDBC result set into a VoltDB Table • Vertica Udx • User-defined function to load Vertica result sets into a VoltDB Table • Apache Hive and Apache Pig • Hadoop OutputFormat to load Hive and Pig result sets into VoltDB 14
  • 15.
    © 2015 VoltDBPROPRIETARY FAST DATA INTEGRATIONS - EXPORTERS • HDFS Export • Hadoop export via WebHDFS and HttpFS • HTTP Export • Delivery and Alerting via HTTP post/get • Kafka Export, RabbitMQ Export • Message queue delivery • Export format configurable • Avro, CSV, TSV, more coming… 15
  • 16.
    © 2015 VoltDBPROPRIETARY FAST DATA PIPELINE SAMPLE APPLICATION • Streaming Data, Real-time Analytics • Export to Hadoop • Export to OLAP (Vertica, others) • Place historical decisioning intelligence into VoltDB • Closed Loop, via Hive, Pig OutputFormat or Vertica Udx • Download: https://github.com/VoltDB/app-fastdata • And see our blog posts: http://voltdb.com/blog/fast-data-look-voltdb-sample-app 16
  • 17.
    © 2015 VoltDBPROPRIETARY LAMBDA ARCHITECTURE SAMPLE APPLICATION • Simplified Lambda Architecture “Speed Layer” • Real-Time Analytics • Serving Layer • Demonstration at the end of this presentation 17
  • 18.
    © 2015 VoltDBPROPRIETARY SQL • SQL Subquery • INSERT INTO … SELECT • UPSERT • More JSON • SET_FIELD() column function • Shortcut field/path notation • Query Timeout • Enhanced Capped Collections 18
  • 19.
    © 2015 VoltDBPROPRIETARY SQL CAPPED COLLECTIONS CREATE TABLE EVENTS ( WHEN_OCCURRED TIMESTAMP, INFO VARCHAR (256), LIMIT PARTITION ROWS 100 EXECUTE ( DELETE FROM EVENTS ORDER BY WHEN_OCCURRED, INFO LIMIT 1 ) ); When querying, facilitates Real-Time Analytics over a “time window” of data. 19
  • 20.
    © 2015 VoltDBPROPRIETARY TRADITIONAL DEVELOPMENT MODEL • Start an empty database • Catalog no longer needed! $ sqlcmd SQL Command :: localhost:21212 1> CREATE TABLE contestants 2> ( 3> contestant_number integer NOT NULL 4> , contestant_name varchar(50) NOT NULL 5> , CONSTRAINT PK_contestants PRIMARY KEY 6> ( 7> contestant_number 8> ) 9> ); 20
  • 21.
    © 2015 VoltDBPROPRIETARY TRADITIONAL DEVELOPMENT MODEL (CONT) • ALTER TABLE • CREATE INDEX/PROCEDURE/ROLE/TABLE/VIEW • DROP INDEX/PROCEDURE/TABLE/VIEW • JDBC and ODBC driver 21
  • 22.
    © 2015 VoltDBPROPRIETARY VOLTDB MANAGEMENT CENTER (VMC) 22
  • 23.
    © 2015 VoltDBPROPRIETARY DOWNLOAD V5.0 TODAY • VoltDB Community Edition • Open Source, available on github.com/voltdb • VoltDB Enterprise Edition • Production-ready • Fully durable, highly available • voltdb.com/download/software VoltDB runs over 6 BILLION transactions/day in production! 23
  • 24.
  • 25.
    Page25 Hortonworks Confidential2014 BIG DATA : LAMBDA ARCHITECTURE Key Tenants of Lamda Architecture  Batch Layer  Manages master data  Immutable, append-only set of raw data  Cleanse, Normalize & Pre-Compute Batch Views  Advanced Statistical Calculations  Speed layer  Real Time Event Stream Processing  Computes Real-Time Views  Serving Layer  Low-latency, ad-hoc query  Reporting, BI & Dashboard New Data Stream All Data (HDFS) Pre-Compute Views & Deep Analytics Process Streams Incremental Views Business View Business View Query SPEED LAYER BATCH LAYER SERVING LAYER
  • 26.
    Page26 Hortonworks Confidential2014 BIG DATA WITH HDP & VOLTDB
  • 27.
    Page27 Hortonworks Confidential2014 HDP DELIVERS A COMPREHENSIVE DATA MANAGEMENT PLATFORM Hortonworks Data Platform 2.2 YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Script Pig SQL Hive TezTez Java Scala Cascading Tez ° ° ° ° ° ° ° ° ° ° ° ° ° ° Others ISV Engines HDFS (Hadoop Distributed File System) Stream Storm Search Solr NoSQL HBase Accumulo Slider Slider SECURITYGOVERNANCE OPERATIONSBATCH, INTERACTIVE & REAL-TIME DATA ACCESS In-Memory Spark Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox Cluster: Ranger Deployment ChoiceLinux Windows On-Premises Cloud YARN is the architectural center of HDP Enables batch, interactive and real-time workloads Provides comprehensive enterprise capabilities The widest range of deployment options Delivered Completely in the OPEN
  • 28.
    VOLTDB FAST DATA DEMO JohnHugg VoltDB Founding Engineering
  • 29.
    © 2015 VoltDBPROPRIETARY The Lambda Architecture 29
  • 30.
    © 2015 VoltDBPROPRIETARY LAMBDA OVERVIEW • Batch processing is well understood and robust. Latency is pretty horrific. • Stream processing is immediate. Complex and not as robust to hardware or user failure. • Lambda Architecture says do both in parallel to compensate. Speed Layer & Batch Layer 30
  • 31.
    © 2015 VoltDBPROPRIETARY EXAMPLE LAMBDA STACK Speed Layer Batch Layer 31
  • 32.
    © 2015 VoltDBPROPRIETARY EXAMPLE PROBLEM 32
  • 33.
    © 2015 VoltDBPROPRIETARY HOW MANY PEOPLE USED MY APP TODAY? 33
  • 34.
    © 2015 VoltDBPROPRIETARY HOW MANY UNIQUE USERS INTERACTED WITH MY APP TODAY? 34
  • 35.
    © 2015 VoltDBPROPRIETARY Open Cupcake Time App Identifier Unique Device ID appid = 87 deviceid = 12 35
  • 36.
    © 2015 VoltDBPROPRIETARY Open Cupcake Time App Identifier Unique Device ID appid = 87 deviceid = 12 The Lambda Architecture 36
  • 37.
    © 2015 VoltDBPROPRIETARY 1 MILLION APPID,DEVICEID PAIRS PER SECOND 37
  • 38.
    © 2015 VoltDBPROPRIETARY Enter HyperLogLog A method of estimating cardinality. blob = update(integer, blob) integer = estimate(blob) Fixed blob size. A few kilobytes to get 99% accuracy. 38
  • 39.
    © 2015 VoltDBPROPRIETARY Open Cupcake Time App Identifier Unique Device ID appid = 87 deviceid = 12 39
  • 40.
    © 2015 VoltDBPROPRIETARY Open Cupcake Time App Identifier Unique Device ID appid = 87 deviceid = 12 40
  • 41.
    © 2015 VoltDBPROPRIETARY DECLARE SQL STATEMENTS 41
  • 42.
    © 2015 VoltDBPROPRIETARY PARAMS ARE APP ID & DEVICE ID 42
  • 43.
    © 2015 VoltDBPROPRIETARY GET ROW FOR THIS APP ID FROM STATE 43
  • 44.
    © 2015 VoltDBPROPRIETARY CREATE A HYPERLOGLOG STRUCTURE FROM THE ROW OR CREATE A NEW HLL IF NO ROW 44
  • 45.
    © 2015 VoltDBPROPRIETARY ADD THIS UNIQUE ID TO THE HLL STRUCTURE 45
  • 46.
    © 2015 VoltDBPROPRIETARY UPDATE ROW WITH NEW HLL BYTES AND THE COMPUTED ESTIMATE 46
  • 47.
    © 2015 VoltDBPROPRIETARY ADVANTAGES 47
  • 48.
    © 2015 VoltDBPROPRIETARY LESS COMPLEX OPERATIONALLY v s . 48
  • 49.
    © 2015 VoltDBPROPRIETARY LESS CODE IN FEWER PLACES • HyperLogLog code is used entirely within one stored procedure. • Client uses SQL + simple schema for queries & reporting. Less Complex Development SELECT appid, devicecount FROM estimates ORDER BY devicecount DESC LIMIT 10; 49
  • 50.
    © 2015 VoltDBPROPRIETARY DEMO 50
  • 51.
    © 2015 VoltDBPROPRIETARY THANK YOU! 51