The volume, complexity and unpredictability of streaming data is greater than ever before. Innovative organizations require instant insight from streaming data in order to make real-time business decisions. A new technology stack is emerging as traditional databases and data lakes are challenged to analyze streaming data and historical data together in real time.
Confluent Platform, a more complete distribution of Apache Kafka®, works with Kinetica’s GPU-accelerated engine to transform data on the wire, instantly ingest data and analyze it at the same time. With the Kinetica Connector, end users can ingest streaming data from sensors, mobile apps, IoT devices and social media via Kafka into Kinetica’s database to combine it with data at rest. Together, the technologies deliver event-driven and real-time data to power the speed of thought analytics, improve customer experience, deliver targeted marketing offers and increase operational efficiencies.
3. CONSTANTLY
GROWING NUMBER
OF SMART DEVICES
DRIVING STREAMING
EXTREME DATA Explosion of
Data
•Structured
•Unstructured
•Smart devices
•Connected cars
•Sensors
Real-time
Demands
•Need for
driven
•Demand for
analytics on
data not stale
Existing tech
doesn’t work
•Workloads are
I/O & compute
bound
•Too complex –
involves
technologies
together
•Batch oriented
8. 1 NODE (1TB/2GPU)
PARALLEL
INGEST
1 NODE (1TB/2GPU)
1 NODE (1TB/2GPU)
Each node of the system can
share the task of data ingest,
provides more and faster
throughput. It can always be
made faster simply by adding
more nodes.
PARALLEL INGEST
PROVIDES HIGH
PERFORMANCE
STREAMING
0 200 400 600 800 1000 1200
Time to ingest 100M Tweets
Leading In-memory DB NoSQL DB
150s
753s
1029s
9. KINETICA & CONFLUENT IN YOUR ECOSYSTEM
ETL / STREAM
PROCESSING
SQL
Native
APIs
PARALLEL
INGEST
Geospatial
WMS
Custom
Connectors
BI DASHBOARDS
BI / GIS / APPS
CUSTOM APPS
& GEOSPATIAL
KINETICA ‘REVEAL’
STREAMINGDATA
UDFs
ON DEMAND SCALE OUT +
Built-in Machine Learning
CUSTOM
LOGIC
BIDMach
ERP / CRM /
TRANSACTIONA
L
CERTIFIED
CONNECTOR
CERTIFIED
CONNECTOR
10. BUILT FOR BUSINESS USERS & DATA SCIENTISTS
MACHINE
LEARNING
MASSIVE
PARALLEL
COMPUTING
CUSTOM
APPLICATIONS
GEOSPATIAL
VISUALIZATION
STREAMING DATA
ANALYSIS
ADVANCED
ANALYTICS
BUSINESS
USERS
DATA SCIENTISTS /DEVELOPERS
11. CABLE & BROADCASTING |
REAL-TIME VIEWERSHIP ANALYSIS
LARGE US CABLE PROVIDER
BUSINESS OBJECTIVE
Real-time analysis of live viewership
across all broadcasted channels
particularly for live events ( Ex. Super
bowl, Olympics)
NEW CAPABILITIES DELIVERED
Ability to collect data streaming from
set-top boxes and analyze it in real-
time to track viewership by senior
executive team
12. ADTECH | REAL-TIME CAMPAIGN REPORTING
BUSINESS OBJECTIVE
Be first to market with game changing
technologies that put publishers’ needs first
NEW CAPABILITIES DELIVERED
High-speed ingest, store, and persist data
processing capabilities
Ad-hoc analytics on ad impression and bid
data
16. 16
+ Distributed Clustered Storage
Kafka is a blend of messaging, stream processing, ETL and
modern database designs built around a distributed log
+ Streaming Platform
Pub/Sub
Messaging
ETL
Connectors
Spark
Flink
Beam
IBM MQ
TIBCO
RabbitMQ
Mulesoft
Talend
Informatica
Kafka is much more than messaging
+ Exactly Once
+ Designed for the Cloud+ Inter-DC
Replication
+ Schema Evolution
Stream
Processing
20. Confidential 20
What does a streaming platform do?
Publish and
subscribe to
streams of data
Similar to a message
queue or enterprise
messaging system
110101
010111
001101
100010
Store streams
of data
In a fault tolerant way110101
010111
001101
100010
Process streams
of data
In real time,
as they occur
110101
010111
001101
100010
21. Confidential 21
A streaming platform has many benefits
•Lower latency—better
customer experience
•Decoupled architecture—
future-proof, reduce risk,
reduce costs, easier to run
•Highly performant
and scalable
24. Confidential 24
Apache Kafka
Kafka Streams API
Write standard Java applications and microservices
to process your data in real time
Kafka Connect API
Reliable and scalable integration of Kafka
with other systems—no coding required
Orders
Table
Customers
Kafka Streams API
25. Confidential 25
Confluent Open Source
Connectors and Clients
Native Apache Kafka producer/consumer client
libraries, plus connectors for Kafka Connect
KSQL
Streaming SQL engine for Apache Kafka
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
26. Confidential 26
Confluent Open Source
REST Proxy
Send and teceive data to/from Apache Kafka
using REST calls
Schema Registry
Store and enforce the schemas used per topic
{
"type": "record",
"name": "LOGON",
"namespace": "ORCL.SOE2",
"fields": [
{
"name": "table",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "op_type",
"type": [
"null",
"string"
],
"default": null
},
27. Confidential 27
Confluent Open Source
Docker images, deb/yum
installers
Easier to install and deploy, also AWS and Azure
quickstart templates
Confluent CLI
Easily work with Confluent Platform on a single-
node sandbox environment
29. Confidential 29
Confluent Enterprise
Auto Data Balancer
Dynamically move partitions to optimize resource
utilization and reliability
JMS Client
Integrate with existing JMS applications,
migrate seamlessly away from legacy JMS
MQs
Before
After
Rebalance
Enhanced Security
ACL support for REST Proxy and Schema
Registry
31. Confidential 31
KSQL: the streaming SQL engine for Apache Kafka from Confluent
✓All you need is SQL
✓No separate processing cluster required
✓Powered by Kafka: elastic, scalable,
distributed, battle-tested
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u
ON c.userid = u.userid
WHERE u.level = 'Platinum';
KSQL is the simplest way to process streams of data in real-
time
✓Perfect for streaming ETL, anomaly detection,
event monitoring and more
✓Part of Confluent Open Source
https://github.com/confluentinc/ksql
32. Confidential 32
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
KSQL: the simplest way to do stream processing
33. Confidential 33
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
KSQL: the simplest way to do stream processing
1 2 3Streaming ETL Anomaly detection Monitoring
34. 34
1)How to run KSQL: standalone aka “local mode”
• Starts a CLI, an engine and a REST server all in the same JVM
• Ideal for laptop development
• Start with default settings:
> bin/ksql-cli local
• Or with customized settings:
> bin/ksql-cli local –-properties-file foo/bar/ksql.properties
35. 35
2) How to run KSQL: client-server
• Start any number of server nodes
• > bin/ksql-server-start
• Start any number of CLIs and specify “remote” server address
• >bin/ksql-cli remote http://myserver:8090
• All running engines share the processing load
• Technically, instances of the same Kafka Streams
applications
• Scale up/down without restart
36. 36
3) How to run KSQL: as an application
• Start any number of engine instances
• Pass a file of KSQL statements to execute
> bin/ksql-node query-file=foo/bar.sql
• Ideal for streaming ETL application deployment
• Version control your queries and transformations as code
• All running engines share the processing load
• Technically, instances of the same Kafka Streams
applications
• Scale up/down without restart
37. DEMO
Live data from
our San
Francisco
Firewall
DEMO
Machine Learning
Reveal Dashboard for Analyst
SQL Developer to produce
INSIGHTS
Real time joins on
streaming data
INSIGHTS
Kafka
Netflow
Blacklist
KSQL Threats
39. DEMO
Fast
moving data
DEMO RECAP
Bringing together all types of
analysts on real time data
Manipulate real
time data to
trigger events
INSIGHTS
Kafka
Netflow
Blacklist
KSQL Threats