Real-Time Analytics with Confluent and MemSQL

Hans Jespersen and Steven Camiña
August 11, 2016
Enabling Real-Time Analytics for IoT

The Rise of Real-Time Analytics
On-demand economy Internet of Things New technologies

Auto and Transportation
Delivery
Energy
Warehousing and Logistics
Manufacturing
Healthcare
Industries that Need Real Time

Data
Producers
(simulating
sensor activity)
User
Interface
Architecting for Real-Time Analytics
Databasegateway
gateway
...
gateway
Message
Queue
Data
Transformation

High-Speed Ingest
Data
Producers
(simulating
sensor activity)
Data
Transformation
User
Interface
Database
Message
Queue
gateway
gateway
...
gateway

7
About Confluent and Apache Kafka
• Founded by the creators of Apache Kafka
• Founded September 2014
• Technology developed while at LinkedIn
• 73% of active Kafka committers
Cheryl
Dalrymple
CFO
Jay
Kreps
CEO
Neha
Narkhede
CTO, VP
Engineering
Luanne
Dauber
CMO
Leadership
Todd
Barnett
VP WW Sales
Jabari
Norton
VP Business
Dev

8
What is a Stream Data Platform?
KAFKA
Stream Data
Platform
Search
NoSQL
RDBMS Monitoring
Stream ProcessingReal-time Analytics Data Warehouse
Apps
Apps
Hadoop
Synchronous Req/Response
0 – 100s ms
Near Real Time
> 100s ms
Offline Batch
> 1 hour
Build streaming applications
Deploy streaming applications at scale
Monitor and manage streaming applications
Common Kafka Use Cases
• Log data
• Database changes
• Sensors and device data
• Monitoring streams
• Call data records
• Monitoring
• Asynchronous
applications
• Fraud and security

9
Confluent Platform
Confluent Platform
Alerting
Monitoring
Real-time
Analytics
Custom
Application
Transformations
Real
Time
Applications
Apache Kafka Core
Connectors
Control Center REST Proxy & Schema Registry
Hadoop
ERP
CRM
Data Warehouse
RDBMS
Data
Integration
Connectors
Database
Changes
Mobile DevicesloTLogs Website Events
Confluent Platform Confluent Platform Enterprise External Product
Support, Services and Consulting
Kafka Streams
Source Sink

10
Confluent Control Center
Configures Kafka Connect data pipelines
Monitors all pipelines from end-to-end

Confluent Streaming
Data Platform
Data
Producers
(simulating
sensor activity)
Architecting for IoT Streaming Data Ingestion
REST
MQTT
WSS
Data
Transformation
User
Interface
Kafka
Cluster
...
Database

Fast, Performant Data Storage
Data
Transformation
User
Interface
Database
Message
Queue
Data
Producers
(simulating
sensor activity)
gateway
gateway
...
gateway

14
Designed for Modern Operational Workloads
Scalable SQL
In-Memory
and
Solid-State
Distributed Datacenter or Cloud
▪ Multi-mode
▪ OLTP, OLAP, HTAP
▪ Multi-model
▪ ANSI SQL
▪ Document/JSON
▪ Geospatial
▪ In-Memory rowstore
▪ Solid-state columnstore
▪ Stream directly to rowstore
or columnstore
▪ Distributed query optimizer
and execution
▪ Scale-out on commodity
hardware
▪ Deploy on-premises
▪ Cloud agnostic
▪ Amazon
▪ Microsoft
▪ Google
▪ Digital Ocean
Simple Real-Time Low Cost Flexible
SSD

15
Real-Time Processing Features
▪ Ecosystem Compatibility
• MySQL Wire Protocol
• Stream processing through Integrated Apache Spark
▪ In-Memory Performance
• Code Compilation for SQL queries
• Maximum Concurrency with Lock-free components
• Full Data Durability and High Availability
▪ Distributed System Processing
• Distributed Database Joins
• Distributed Query Optimizer
▪ Multi-mode and Multi-model data
• In-Memory Rowstore and Flash/SSD Columnstore
• SQL, JSON and Geospatial data

▪ MemSQL Streamliner is an integrated MemSQL and Apache Spark solution
▪ Deploys Apache Spark with one click
▪ Creates real-time data pipelines through a graphical UI
▪ Open sourced on GitHub at memsql.github.io/spark-streamliner
Real-Time
Application
Real-Time
Inputs
16
Real-Time Data Pipelines with Spark
STREAMLINER
Apache Spark
Extract, Transform, Load

Orchestration / Containers
Cloud / On-Premises Platform
MessagingInputs Real-Time Applications
Business Intelligence
Dashboards
Relational Key-Value Document Geospatial
Existing Data Stores
Rowstore
Columnstore
Real-Time
Data Pipelines
Hadoop Amazon S3MySQL
17
MemSQL Ecosystem and Architecture

MemSQL Platform
Database
Data
Transformation
User
Interface
Message
Queue
Data
Producers
(simulating
sensor activity)
gateway
gateway
...
gateway

Real-Time
Applications
Data
Transformation
User
Interface
Message
Queue
Data
Producers
(simulating
sensor activity)
gateway
gateway
...
gateway
Database

MemEx: IoT Showcase Application
- Combines MemSQL, Apache Kafka,
and Spark for global supply chain
management
- Enables enterprises to predict
throughput of supply warehouses
- Processes 2 million data points, based
on 2,000 sensors across 1,000
warehouses

Data
Producers
(simulating
sensor activity)
MemEx UI
MemEx Architecture
gateway
gateway
...
gateway
Data
Transformation
Apache Spark
Spark MLlib Predictive Model
Raw Sensor 1 + Predictive Score 1
S1 P1
1

Classification
BLUE
Minor
Damage
Type 1
BLACK
training data for
machine operating
normally
ORANGE
Major Damage
Type 2

28
Real-time drilling sensor data to manage the high stakes of
producing oil in a depressed market and maximizing productivity.
+ Top Energy Firm
28

TECHNICAL BENEFITS
- Enabled machine learning scoring of streaming data for real-time
Predictive Analytics
- Integrated SAS BI PMML for deep analytics
- Joined multiple data types and third party sources including
geospatial and weather data
29

30
Spark MLlib Predictive Model
REAL-TIME
INPUTS
Streamliner
Raw Sensor 1 + Predictive Score 1
S1 P1
1
BUSINESS
LOGIC

Continued Rise of IoT
31
Sensor Array
PoS Systems
Connected Fleets
Mobile Apps
Security
Reporting Systems
Log Systems
Data Lake
Data Warehouse
Databases
“By 2020, over 20 billion connected things will be in use across a
range of industries; the IoT will touch every role across the enterprise.”
Source: Gartner

32
“These are highly automated drones. They have what is
called sense-and-avoid technology. That means, basically,
seeing and then avoiding obstacles.”
Yahoo, January 2016: https://www.yahoo.com/tech/exclusive-amazon-reveals-details-about-1343951725436982.html
32
Amazon Invests in Drones for 30 Minute
Post-Order Deliveries

33
Fedex Breaks Record With 317 Million
Packages Shipped Over Christmas 2015
“FedEx Ground continues to advance the industry’s most
automated hub network with investments in package sortation
systems that enable flexible and reliable operations and
six-sided scanning tunnels that boost data and image capture.”
FedEx, October 2015: http://about.van.fedex.com/newsroom/global-english/fedex-forecasts-record-volume-this-holiday-season/
33

The Evolution of Data Analytics
34
Descriptive Analytics Predictive AnalyticsReal-Time Analytics

High-Speed Ingest
Data
Producers
(simulating
sensor activity)
STREAMLINER
Apache Spark
Real-Time
Application
Message Queue

High-Speed Ingest
Data
Producers
(simulating
sensor activity)
STREAMLINER
Apache Spark
Real-Time
Application
MemEx Architecture

37
Top supply chain
companies are turning
to the adoption of
advanced analytics to
improve supply chain
functions.
Source: Gartner

Real-Time Analytics with Confluent and MemSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Real-Time Analytics with Confluent and MemSQL

Similar to Real-Time Analytics with Confluent and MemSQL (20)

More from SingleStore

More from SingleStore (19)

Recently uploaded

Recently uploaded (20)

Real-Time Analytics with Confluent and MemSQL