Achieve Sub-Second
Analytics on Apache Kafka
with Confluent and Imply
Rachel Pedreschi, Field Engineering Director, Imply Data
Josh Treichel, Partner Solutions Architect, Confluent
2
As a software engineer, Josh has spent over 10 years building,
integrating and supporting complex systems. He previously worked on
Confluent’s Customer Operations Team supporting some of the largest
Kafka and Confluent deployments in the world.
Josh Treichel
Partner Solutions Architect, Confluent
A "big data geek-ette," Rachel is no stranger to the world of big data, fast
data and everything in between. She is a Vertica-, Informix-, and
Redbrick-certified database administrator on top of her work with Apache
Cassandra™, Apache® Ignite™ and Apache Druid (incubating). She has
more than 20 years of high-performance database experience. Rachel has
an MBA from San Francisco State University.
Rachel Pedreschi
Field Engineering Director, Imply Data
3
Session Overview
● This session will be one hour
● The last 10-15 minutes will consist of Q&A
● Submit questions by entering them into the GoToWebinar panel
● The slides and recording will be available
4
https://tinyurl.com/confluentimply
Founded by the creators
of Apache Kafka
Technology Developed
while at LinkedIn
Largest Contributor and
tester of Apache Kafka
● Founded in 2014
● Raised $84M from Benchmark, Index, Sequoia
● 350+ Employees
● Transacting in 20 countries
● Hundreds of enterprise subscription customers
● Commercial entities in US, UK, Germany, Australia
66
Business Digitization Trends are Revolutionizing your Data Flow
Massive volumes of
new data generated
every day
Mobile Cloud Microservices Internet of
Things
Machine
Learning
Distributed across
apps, devices,
datacenters, clouds
Structured,
unstructured
77
Legacy Data Infrastructure Solutions Have Architectural Flaws
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
ESB
These solutions can be
● Batch-oriented, instead of
event-oriented in real time
● Complex to scale at high
throughput
● Connected point-to-point, instead
of publish / subscribe
● Lacking data persistence and
retention
● Incapable of in-flight message
processing
App App
88
Modern Architectures are Adapting to New Data Requirements
NoSQL DBs Big Data Analytics
But how do we
revolutionize data flow
in a world of exploding,
distributed and ever
changing data?
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
ESB
App App
99
The Solution is a Streaming Platform for Real-Time Data Processing
A Streaming Platform
provides a single source
of truth about your data
to everyone in your
organization
NoSQL DBs Big Data Analytics
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App AppApp App
Streaming Platform
1010
Kafka: Next Generation Messaging
1111
Over 35% of Fortune 500’s Already Trust Kafka for Mission-Critical Apps
6 of top 10
Travel
7 of top 10
Global banks
8 of top 10
Insurance
9 of top 10
Telecom
1212
Pub-sub messaging in real-time at scale
Connectivity for all producers and consumers
Data persistence with infinite retention
Stream processing without coding
Distributed architecture for global deployment
Confluent Platform
The streaming platform built by the creators of Apache Kafka
1313
Confluent Delivers a Mission-Critical Streaming Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Enterprise Operations
Replicator | Auto Data Balancer | Connectors | MQTT Proxy | Operator
Database
Changes
Log Events IoT Data Web Events other events
Hadoop
Database
Data
Warehouse
CRM
other
DATA INTEGRATION
Transformations
Custom Apps
Analytics
Monitoring
other
REAL-TIME APPLICATIONS
OPEN SOURCE FEATURES COMMERCIAL FEATURES
Datacenter Public Cloud Confluent Cloud
Confluent Platform
Management & Monitoring
Control Center | Security
Development & Connectivity
Clients | Connectors | REST Proxy | KSQL
CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
1414
KSQL: Enable Stream Processing using SQL-like Semantics
Example Use Cases
• Streaming ETL
• Anomaly detection
• Event monitoring
Leverage Kafka Streams API
without any coding required
KSQL server
Engine
(runs queries)
REST API
CLIClients
Confluent
Control Center
GUI
Kafka Cluster
Use any programming language
Connect via Control Center UI,
CLI, REST or headless
1515
CREATE STREAM enriched_clickstream AS
SELECT userid,status,request,ip,users.city
FROM clickstream c
LEFT JOIN web_users users on c.userid =
users.user_id;
KSQL: the Simplest Way to Do Stream Processing
Streaming ETL
Founded by the creators
of Apache Druid
(incubating) and D3
Technology Developed
while at Metamarkets
Largest Contributor and
tester of Apache Druid
● Founded in 2015
● Raised $13M from Andreesen Horowitz, Khosla
● 1000s of open source implementations
● Hundreds of enterprise subscription customers
● End to end streaming analytics platform built on
Apache Druid
17
Old World - Data Warehouses and Data Marts
images copied from: https://panoply.io/data-warehouse-guide/data-warehouse-architecture-traditional-vs-cloud/
18
Less Old World - Data Lakes
images copied from: https://panoply.io/data-warehouse-guide/data-warehouse-architecture-traditional-vs-cloud/
19
New World - Data Rivers!
images copied from: https://panoply.io/data-warehouse-guide/data-warehouse-architecture-traditional-vs-cloud/
20
21
22
Imply is the only end to end solution for streaming analytics built on Apache Druid
23
24
25Confidential
DEMO TIME!!
26Confidential
Putting KSQL to work with Imply: Monitoring for real-time business alerts
User click-stream data flows through Kafka and
enriched with KSQL
Data is easily visualized and explored in Imply
WithKSQL
Your website
Click-stream
Exploratory Analytics
Dashboards
Machine Learning / AI
Real-time
insights
27Confidential
Q&A
28
Resources and Next Steps
https://confluent.io
http://cnfl.io/ksql
http://cnfl.io/slack
#ksql
@confluentinc
29
Thank you for joining us!

Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply

  • 1.
    Achieve Sub-Second Analytics onApache Kafka with Confluent and Imply Rachel Pedreschi, Field Engineering Director, Imply Data Josh Treichel, Partner Solutions Architect, Confluent
  • 2.
    2 As a softwareengineer, Josh has spent over 10 years building, integrating and supporting complex systems. He previously worked on Confluent’s Customer Operations Team supporting some of the largest Kafka and Confluent deployments in the world. Josh Treichel Partner Solutions Architect, Confluent A "big data geek-ette," Rachel is no stranger to the world of big data, fast data and everything in between. She is a Vertica-, Informix-, and Redbrick-certified database administrator on top of her work with Apache Cassandra™, Apache® Ignite™ and Apache Druid (incubating). She has more than 20 years of high-performance database experience. Rachel has an MBA from San Francisco State University. Rachel Pedreschi Field Engineering Director, Imply Data
  • 3.
    3 Session Overview ● Thissession will be one hour ● The last 10-15 minutes will consist of Q&A ● Submit questions by entering them into the GoToWebinar panel ● The slides and recording will be available
  • 4.
  • 5.
    Founded by thecreators of Apache Kafka Technology Developed while at LinkedIn Largest Contributor and tester of Apache Kafka ● Founded in 2014 ● Raised $84M from Benchmark, Index, Sequoia ● 350+ Employees ● Transacting in 20 countries ● Hundreds of enterprise subscription customers ● Commercial entities in US, UK, Germany, Australia
  • 6.
    66 Business Digitization Trendsare Revolutionizing your Data Flow Massive volumes of new data generated every day Mobile Cloud Microservices Internet of Things Machine Learning Distributed across apps, devices, datacenters, clouds Structured, unstructured
  • 7.
    77 Legacy Data InfrastructureSolutions Have Architectural Flaws App App DWH Transactional Databases Analytics Databases Data Flow DB DB App App MOM MOM ETL ETL ESB These solutions can be ● Batch-oriented, instead of event-oriented in real time ● Complex to scale at high throughput ● Connected point-to-point, instead of publish / subscribe ● Lacking data persistence and retention ● Incapable of in-flight message processing App App
  • 8.
    88 Modern Architectures areAdapting to New Data Requirements NoSQL DBs Big Data Analytics But how do we revolutionize data flow in a world of exploding, distributed and ever changing data? App App DWH Transactional Databases Analytics Databases Data Flow DB DB App App MOM MOM ETL ETL ESB App App
  • 9.
    99 The Solution isa Streaming Platform for Real-Time Data Processing A Streaming Platform provides a single source of truth about your data to everyone in your organization NoSQL DBs Big Data Analytics App App DWH Transactional Databases Analytics Databases Data Flow DB DB App AppApp App Streaming Platform
  • 10.
  • 11.
    1111 Over 35% ofFortune 500’s Already Trust Kafka for Mission-Critical Apps 6 of top 10 Travel 7 of top 10 Global banks 8 of top 10 Insurance 9 of top 10 Telecom
  • 12.
    1212 Pub-sub messaging inreal-time at scale Connectivity for all producers and consumers Data persistence with infinite retention Stream processing without coding Distributed architecture for global deployment Confluent Platform The streaming platform built by the creators of Apache Kafka
  • 13.
    1313 Confluent Delivers aMission-Critical Streaming Platform Apache Kafka® Core | Connect API | Streams API Data Compatibility Schema Registry Enterprise Operations Replicator | Auto Data Balancer | Connectors | MQTT Proxy | Operator Database Changes Log Events IoT Data Web Events other events Hadoop Database Data Warehouse CRM other DATA INTEGRATION Transformations Custom Apps Analytics Monitoring other REAL-TIME APPLICATIONS OPEN SOURCE FEATURES COMMERCIAL FEATURES Datacenter Public Cloud Confluent Cloud Confluent Platform Management & Monitoring Control Center | Security Development & Connectivity Clients | Connectors | REST Proxy | KSQL CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
  • 14.
    1414 KSQL: Enable StreamProcessing using SQL-like Semantics Example Use Cases • Streaming ETL • Anomaly detection • Event monitoring Leverage Kafka Streams API without any coding required KSQL server Engine (runs queries) REST API CLIClients Confluent Control Center GUI Kafka Cluster Use any programming language Connect via Control Center UI, CLI, REST or headless
  • 15.
    1515 CREATE STREAM enriched_clickstreamAS SELECT userid,status,request,ip,users.city FROM clickstream c LEFT JOIN web_users users on c.userid = users.user_id; KSQL: the Simplest Way to Do Stream Processing Streaming ETL
  • 16.
    Founded by thecreators of Apache Druid (incubating) and D3 Technology Developed while at Metamarkets Largest Contributor and tester of Apache Druid ● Founded in 2015 ● Raised $13M from Andreesen Horowitz, Khosla ● 1000s of open source implementations ● Hundreds of enterprise subscription customers ● End to end streaming analytics platform built on Apache Druid
  • 17.
    17 Old World -Data Warehouses and Data Marts images copied from: https://panoply.io/data-warehouse-guide/data-warehouse-architecture-traditional-vs-cloud/
  • 18.
    18 Less Old World- Data Lakes images copied from: https://panoply.io/data-warehouse-guide/data-warehouse-architecture-traditional-vs-cloud/
  • 19.
    19 New World -Data Rivers! images copied from: https://panoply.io/data-warehouse-guide/data-warehouse-architecture-traditional-vs-cloud/
  • 20.
  • 21.
  • 22.
    22 Imply is theonly end to end solution for streaming analytics built on Apache Druid
  • 23.
  • 24.
  • 25.
  • 26.
    26Confidential Putting KSQL towork with Imply: Monitoring for real-time business alerts User click-stream data flows through Kafka and enriched with KSQL Data is easily visualized and explored in Imply WithKSQL Your website Click-stream Exploratory Analytics Dashboards Machine Learning / AI Real-time insights
  • 27.
  • 28.
    28 Resources and NextSteps https://confluent.io http://cnfl.io/ksql http://cnfl.io/slack #ksql @confluentinc
  • 29.
    29 Thank you forjoining us!