1
INTRODUCING: CREATE PIPELINE
Exactly-one semantics with Kafka
and MemSQL Pipelines
OCTOBER 2016
Gary Orenstein, CMO
Steven Camiña, Product Manager
How important are real-time
initiatives to your organization?
Not Important
Critical
Somewhat Important
Very Important
Important 32%
29%
22%
14%
3%
3 MemSQL State of Real Time Survey 2016, n=125
Other
Drive new revenue opportunities
Increase efficiency and reduce cost
Power new applications
(Example: IoT data processing)
Improve customer experience
Deliver real-time dashboards 52%
52%
43%
36%
29%
4%
Why are you pursuing real-time initiatives?
3 MemSQL State of Real Time Survey 2016, n=125
Want real-time dashboards and
a better customer experiences?
Start with fresh data
4
A Modern Approach to Real-Time Data
Streaming Database Data Warehouse
5
Streaming, database, and data warehouse workloads
Database
Data Warehouse
Streaming
6
Real-time pipelines, OTLP, and OLAP
High Volume Transactions
OLTP
Fast, Scalable SQL Analytics
OLAP
Real-time
Pipelines
7
The Database Platform For Real-Time Analytics
Real-Time Scalable Proven
• Fast Ingest
• Low Latency Queries
• High Concurrency
• Horizontal Scale
• On-premises or Cloud
• Highly Available and Secure
• ANSI SQL
• Transactional and relational
• Universal Ecosystem
SQL
8
Today’s Streaming Discussion
9
Web logs, Mobile apps IoT, Sensors
The Enterprise Opportunity – Past and Future
10
Web logs, Mobile apps IoT, Sensors
Take
Existing
Batch
Processes
Real-Time
11
Nothing Closer To Real Time Than Streaming
 Let’s look at the leading edge
 Apache Kafka
 Messaging Semantics
• At most once
• At least once
• Exactly once
12
At most once
000
?
13
At least once
000
000
000
000
000
000
000
000
000
14
Exactly-Once
000
15
Understanding Streaming Semantics
At most once
Message pulled once
May or may not be
received
No duplicates
Possible missing data
000
?
16
Understanding Streaming Semantics
At most once At least once
Message pulled once Message pulled one or
more times;
processed each time
May or may not be
received
Receipt guaranteed
No duplicates Likely duplicates
Possible missing data No missing data
000
? 000
000
000
000
000
000
000
000
000
17
Understanding Streaming Semantics
At most once At least once Exactly-once
Message pulled once Message pulled one or
more times;
processed each time
Message pulled one or
more times;
processed once
May or may not be
received
Receipt guaranteed Receipt guaranteed
No duplicates Likely duplicates No duplicates
Possible missing data No missing data No missing data
000
? 000
000
000
000
000
000
000
000
000
000
MemSQL Pipelines
18
19
CREATE TABLE
20
CREATE PIPELINE
21
Introducing MemSQL Pipelines
 CREATE PIPELINE is a database construct that enables
data ingestion with exactly-once semantics
• MemSQL stores the Kafka offset in a table
• Exactly once delivery facilitated by co-locating data and offsets
 Extract, transform, and load external data natively
 Fully distributed workloads
 User-defined transformations
 Scalable, highly performant, online ALTER TABLE and
ALTER PIPELINE
22
MemSQL Pipelines Sequence
1. Extract from data sources
2. Transform extracted data
3. Load transformed data into Database tables in parallel
Data
Sources
MemSQL
1. Extract 2. Transform extracted data 3. Load into Database tables
Pipelines
23
MemSQL Pipelines Architecture: Kafka Example
Kafka
Broker
MemSQL NodePipelines
Kafka
Broker
MemSQL NodePipelines
Kafka
Broker
MemSQL NodePipelines
MemSQL MasterPipelines
1. Extract 2. Transform 3. Load
Data
reshuffle
Metadata query
1. Extract 2. Transform 3. Load
1. Extract 2. Transform 3. Load
MemSQL Pipelines
Demo
24
25
Demo Architecture
Kafka Pipelines
Twitter
MemSQL
SQL Insights
Visualization
Dashboard
26
THANK YOU!
www.memsql.com/download

INTRODUCING: CREATE PIPELINE

  • 1.
    1 INTRODUCING: CREATE PIPELINE Exactly-onesemantics with Kafka and MemSQL Pipelines OCTOBER 2016 Gary Orenstein, CMO Steven Camiña, Product Manager
  • 2.
    How important arereal-time initiatives to your organization? Not Important Critical Somewhat Important Very Important Important 32% 29% 22% 14% 3% 3 MemSQL State of Real Time Survey 2016, n=125
  • 3.
    Other Drive new revenueopportunities Increase efficiency and reduce cost Power new applications (Example: IoT data processing) Improve customer experience Deliver real-time dashboards 52% 52% 43% 36% 29% 4% Why are you pursuing real-time initiatives? 3 MemSQL State of Real Time Survey 2016, n=125
  • 4.
    Want real-time dashboardsand a better customer experiences? Start with fresh data 4
  • 5.
    A Modern Approachto Real-Time Data Streaming Database Data Warehouse 5
  • 6.
    Streaming, database, anddata warehouse workloads Database Data Warehouse Streaming 6
  • 7.
    Real-time pipelines, OTLP,and OLAP High Volume Transactions OLTP Fast, Scalable SQL Analytics OLAP Real-time Pipelines 7
  • 8.
    The Database PlatformFor Real-Time Analytics Real-Time Scalable Proven • Fast Ingest • Low Latency Queries • High Concurrency • Horizontal Scale • On-premises or Cloud • Highly Available and Secure • ANSI SQL • Transactional and relational • Universal Ecosystem SQL 8
  • 9.
    Today’s Streaming Discussion 9 Weblogs, Mobile apps IoT, Sensors
  • 10.
    The Enterprise Opportunity– Past and Future 10 Web logs, Mobile apps IoT, Sensors Take Existing Batch Processes Real-Time
  • 11.
    11 Nothing Closer ToReal Time Than Streaming  Let’s look at the leading edge  Apache Kafka  Messaging Semantics • At most once • At least once • Exactly once
  • 12.
  • 13.
  • 14.
  • 15.
    15 Understanding Streaming Semantics Atmost once Message pulled once May or may not be received No duplicates Possible missing data 000 ?
  • 16.
    16 Understanding Streaming Semantics Atmost once At least once Message pulled once Message pulled one or more times; processed each time May or may not be received Receipt guaranteed No duplicates Likely duplicates Possible missing data No missing data 000 ? 000 000 000 000 000 000 000 000 000
  • 17.
    17 Understanding Streaming Semantics Atmost once At least once Exactly-once Message pulled once Message pulled one or more times; processed each time Message pulled one or more times; processed once May or may not be received Receipt guaranteed Receipt guaranteed No duplicates Likely duplicates No duplicates Possible missing data No missing data No missing data 000 ? 000 000 000 000 000 000 000 000 000 000
  • 18.
  • 19.
  • 20.
  • 21.
    21 Introducing MemSQL Pipelines CREATE PIPELINE is a database construct that enables data ingestion with exactly-once semantics • MemSQL stores the Kafka offset in a table • Exactly once delivery facilitated by co-locating data and offsets  Extract, transform, and load external data natively  Fully distributed workloads  User-defined transformations  Scalable, highly performant, online ALTER TABLE and ALTER PIPELINE
  • 22.
    22 MemSQL Pipelines Sequence 1.Extract from data sources 2. Transform extracted data 3. Load transformed data into Database tables in parallel Data Sources MemSQL 1. Extract 2. Transform extracted data 3. Load into Database tables Pipelines
  • 23.
    23 MemSQL Pipelines Architecture:Kafka Example Kafka Broker MemSQL NodePipelines Kafka Broker MemSQL NodePipelines Kafka Broker MemSQL NodePipelines MemSQL MasterPipelines 1. Extract 2. Transform 3. Load Data reshuffle Metadata query 1. Extract 2. Transform 3. Load 1. Extract 2. Transform 3. Load
  • 24.
  • 25.
  • 26.