INTRODUCING: CREATE PIPELINE

1
INTRODUCING: CREATE PIPELINE
Exactly-one semantics with Kafka
and MemSQL Pipelines
OCTOBER 2016
Gary Orenstein, CMO
Steven Camiña, Product Manager

How important are real-time
initiatives to your organization?
Not Important
Critical
Somewhat Important
Very Important
Important 32%
29%
22%
14%
3%
3 MemSQL State of Real Time Survey 2016, n=125

Other
Drive new revenue opportunities
Increase efficiency and reduce cost
Power new applications
(Example: IoT data processing)
Improve customer experience
Deliver real-time dashboards 52%
52%
43%
36%
29%
4%
Why are you pursuing real-time initiatives?
3 MemSQL State of Real Time Survey 2016, n=125

Want real-time dashboards and
a better customer experiences?
Start with fresh data
4

A Modern Approach to Real-Time Data
Streaming Database Data Warehouse
5

Streaming, database, and data warehouse workloads
Database
Data Warehouse
Streaming
6

Real-time pipelines, OTLP, and OLAP
High Volume Transactions
OLTP
Fast, Scalable SQL Analytics
OLAP
Real-time
Pipelines
7

The Database Platform For Real-Time Analytics
Real-Time Scalable Proven
• Fast Ingest
• Low Latency Queries
• High Concurrency
• Horizontal Scale
• On-premises or Cloud
• Highly Available and Secure
• ANSI SQL
• Transactional and relational
• Universal Ecosystem
SQL
8

Today’s Streaming Discussion
9
Web logs, Mobile apps IoT, Sensors

The Enterprise Opportunity – Past and Future
10
Web logs, Mobile apps IoT, Sensors
Take
Existing
Batch
Processes
Real-Time

11
Nothing Closer To Real Time Than Streaming
 Let’s look at the leading edge
 Apache Kafka
 Messaging Semantics
• At most once
• At least once
• Exactly once

13
At least once
000
000
000
000
000
000
000
000
000

15
Understanding Streaming Semantics
At most once
Message pulled once
May or may not be
received
No duplicates
Possible missing data
000
?

16
At most once At least once
Message pulled once Message pulled one or
more times;
processed each time
May or may not be
received
Receipt guaranteed
No duplicates Likely duplicates
Possible missing data No missing data
000
? 000
000
000
000
000
000
000
000
000

17
At most once At least once Exactly-once
Message pulled once Message pulled one or
more times;
processed each time
Message pulled one or
more times;
processed once
May or may not be
received
Receipt guaranteed Receipt guaranteed
No duplicates Likely duplicates No duplicates
Possible missing data No missing data No missing data
000
? 000
000
000
000
000
000
000
000
000
000

21
Introducing MemSQL Pipelines
 CREATE PIPELINE is a database construct that enables
data ingestion with exactly-once semantics
• MemSQL stores the Kafka offset in a table
• Exactly once delivery facilitated by co-locating data and offsets
 Extract, transform, and load external data natively
 Fully distributed workloads
 User-defined transformations
 Scalable, highly performant, online ALTER TABLE and
ALTER PIPELINE

22
MemSQL Pipelines Sequence
1. Extract from data sources
2. Transform extracted data
3. Load transformed data into Database tables in parallel
Data
Sources
MemSQL
1. Extract 2. Transform extracted data 3. Load into Database tables
Pipelines

23
MemSQL Pipelines Architecture: Kafka Example
Kafka
Broker
MemSQL NodePipelines
Kafka
Broker
Kafka
Broker
MemSQL MasterPipelines
1. Extract 2. Transform 3. Load
Data
reshuffle
Metadata query

25
Demo Architecture
Kafka Pipelines
Twitter
MemSQL
SQL Insights
Visualization
Dashboard

26
THANK YOU!
www.memsql.com/download

INTRODUCING: CREATE PIPELINE

More Related Content

What's hot

Viewers also liked

Similar to INTRODUCING: CREATE PIPELINE

More from SingleStore

Recently uploaded

INTRODUCING: CREATE PIPELINE