-Warming: Kafka, Kafka Streams and KSQL as an event-driven microservices architecture. Life is a Stream of Events and we can query it through SQL -Demo: ATM Fraud Detection with Kafka and KSQL
4. What’s Apache Kafka?
• Pub/Sub Logic - Messaging, Done Right
• STORE - Hadoop, Made Fast
• ETL & Data Integration as a Platform
• Stream processing
5. Industry Trends… and why
Apache Kafka matters!
1. From ‘big data’ (batch) to ‘fast data’ (stream processing)
2. Internet of Things (IoT) and sensor data
3. Microservices and asynchronous communication
(coordination messages and data streams) between
loosely coupled and fine-grained services
6. Overview
"Kafka is a “publish-subscribe messaging
rethought as a distributed commit log”
"Fast
"Scalable
"Durable
"Distributed
7. Kafka adoption and use cases
"LinkedIn: activity streams, operational metrics, data bus 400
nodes, 18k topics, 220B msg/day (peak 3.2M msg/s), May 2014
"Netflix: real-time monitoring and event processing
"Twitter: as part of their Storm real-time data pipelines
"Spotify: log delivery (from 4h down to 10s), Hadoop
"Loggly: log collection and processing
"Mozilla: telemetry data
"Airbnb, Cisco, Gnip, InfoChimps, Ooyala, Square, Uber, …
8. Over 35% of Fortune 500’s are using
Apache Kafka™
6 of top 10
Travel
7 of top 10
Global banks
8 of top 10
Insurance
9 of top 10
Telecom
26. From Big Data to Fast Data
5
From big data to fast data
Stream data is
The faster the better
Stream data can be
big or fast (Lambda)
Stream data will be
big AND fast (Kappa)
Apache Kafka is the enabling technology of this transition
Big data was
The more the better
ValueofData
Volume of Data
ValueofData
Age of Data Speed Table Batch Table
DB
Streams Hadoop
Job 1 Job 2
Streams
Table 1 Table 2
DB
40. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Spot patterns within this stream
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
41. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Spot patterns within this stream
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
Legit
Legit
42. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
Spot patterns within this stream
Legit
Dodgy!
Legit
43. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
Spot patterns within this stream
Legit
Dodgy!
Legit
44. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
• Account id
• Location
• Amount
•
Inbound stream of ATM data
https://github.com/rmoff/gess
45. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
KSQL : Stream Processing with SQL
TXN_ID, ATM,
CUSTOMER_NAME,
CUSTOMER_PHONE
ATM_POSSIBLE_FRAUD;
SELECT
FROM
47. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Customer
details
ATM fraud txns
with customer
details
Elasticsearch
Notification
service
1. Spot fraud in stream of
transactions
2.Enrich transaction events
with customer data
48. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
KSQLis the
Streaming
SQL Engine
for
Apache Kafka
49. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
KSQL for Real-Time Monitoring
• Log data monitoring, tracking and alerting
• syslog data
• Sensor / IoT data
CREATE STREAM SYSLOG_INVALID_USERS AS
SELECT HOST, MESSAGE
FROM SYSLOG
WHERE MESSAGE LIKE '%Invalid user%';
http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
50. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
KSQL for Streaming ETL
CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u
ON c.userid = u.user_id
WHERE u.level = 'Platinum';
Joining, filtering, and aggregating streams of event data
51. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
Identifying patterns or anomalies in real-time data,
surfaced in milliseconds
52. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
CREATE STREAM pageviews
WITH (PARTITIONS=4,
VALUE_FORMAT='AVRO') AS
SELECT * FROM pageviews_json;
KSQL for Data Transformation
Make simple derivations of existing topics from the command line
53. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
KSQL in Development and Production
Interactive KSQL
for development and testing
Headless KSQL
for Production
Desired KSQL queries
have been identified
REST
“Hmm, let me try
out this idea...”
54. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Stream Stream joins
Orders
Shipments
Which orders
haven't shipped?
order.id = shipment.order_id
Leadtime
shipment_ts - order_ts
55. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Stream Stream joins
ATM transactions
56. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Stream Stream joins
ATM transactions
58. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Self-Join (Cartesian product)
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
T
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
T1 T2
59. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
T1 T2
Self-Join (Cartesian product)
ATM_TXNS T1
INNER JOIN ATM_TXNS T2
ON T1.ACCOUNT_ID = T2.ACCOUNT_ID
60. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Self-Join (Cartesian product)
FROM ATM_TXNS T1
INNER JOIN ATM_TXNS T2
WITHIN 10 MINUTES
ON T1.ACCOUNT_ID = T2.ACCOUNT_ID
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
T1 T2
61. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Self-Join
T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM
xxx116d91d6-ef17 116d91d6-ef17 11:56:58 11:58:19 Midland Halifax
116d91d6-ef17 xxx116d91d6-ef17 11:58:19 11:56:58 Halifax Midland
xxx116d91d6-ef17 xxx116d91d6-ef17 11:56:58 11:56:58 Midland Midland
116d91d6-ef17 116d91d6-ef17 11:58:19 11:58:19 Halifax Halifax
62. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Self-Join
T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM
xxx116d91d6-ef17 116d91d6-ef17 11:56:58 11:58:19 Midland Halifax
116d91d6-ef17 xxx116d91d6-ef17 11:58:19 11:56:58 Halifax Midland
xxx116d91d6-ef17 xxx116d91d6-ef17 11:56:58 11:56:58 Midland Midland
116d91d6-ef17 116d91d6-ef17 11:58:19 11:58:19 Halifax Halifax
Self join on same txn IDs
63. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Exclude joins on the same txn
WHERE T1.TRANSACTION_ID !=
T2.TRANSACTION_ID
T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM
xxx116d91d6-ef17 116d91d6-ef17 11:56:58 11:58:19 Midland Halifax
116d91d6-ef17 xxx116d91d6-ef17 11:58:19 11:56:58 Halifax Midland
64. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Exclude joins on the same txn
T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM
xxx116d91d6-ef17 116d91d6-ef17 11:56:58 11:58:19 Midland Halifax
116d91d6-ef17 xxx116d91d6-ef17 11:58:19 11:56:58 Halifax Midland
Duplicate results (A:B / B:A)
65. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Join Windows
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
T1 T2
WITHIN 10 MINUTES
WHERE T1.TRANSACTION_ID !=
T2.TRANSACTION_ID
66. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Join Windows
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
T1 T2
WITHIN 10 MINUTES
WHERE T1.TRANSACTION_ID !=
T2.TRANSACTION_ID
67. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Join Windows
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
T1 T2
WITHIN 10 MINUTES
WHERE T1.TRANSACTION_ID !=
T2.TRANSACTION_ID
68. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Only join forward
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
T1 T2
WITHIN (0 MINUTES, 10 MINUTES)
WHERE T1.TRANSACTION_ID !=
T2.TRANSACTION_ID
69. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Only join forward
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
T1 T2
WITHIN (0 MINUTES, 10 MINUTES)
WHERE T1.TRANSACTION_ID !=
T2.TRANSACTION_ID
70. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Only join forward
T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM
xxx116d91d6-ef17 116d91d6-ef17 11:56:58 11:58:19 Midland Halifax
WITHIN (0 MINUTES, 10 MINUTES)
Ignore events in the right-hand
stream prior to those in the left
71. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Only join forward
T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM
xxx116d91d6-ef17 116d91d6-ef17 11:56:58 11:58:19 Midland Halifax
Legit Dodgy!
72. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Photoby EstebanLopez on Unsplash
73. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Calcuate distance between ATMs
GEO_DISTANCE(TX1.location->lat, TX1.location->lon,
TX2.location->lat, TX2.location->lon,
'KM')
TX1
TX2
74. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Calculate time between transactions
TX2.ROWTIME - TX1.ROWTIME AS
MILLISECONDS_DIFFERENCE
(TX2.ROWTIME - TX1.ROWTIME)
/ 1000 / 60 / 60 AS HOURS_DIFFERENCE
75. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Photoby EstebanLopez on Unsplash
GEO_DISTANCE(…) / HOURS_DIFFERENCE
AS KMH_REQUIRED
76. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
So speaking of time…
ksql> PRINT 'atm_txns_gess' ;
Format:JSON
{
"ROWTIME": 1544116309152,
"ROWKEY": "null",
"account_id": "a218",
"timestamp": "2018-12-06 17:09:58 +0000",
"atm": "HSBC",
…}
Kafka message
timestamp
2018-12-06 17:11:49
Event time
78. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
But what about the account holder?
!
79. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Photoby SamuelZeller on Unsplash
80. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Customer
details
ATM fraud txns
with customer
details
Elasticsearch
Notification
service
1. Enrich transaction events
with customer data
81. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
Tasks Workers
Sourcessyslog
flat file
CSV
JSON
MQTT
82. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
Tasks Workers
Sinks
Amazon S3
MQTT
84. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Confluent Hub
hub.confluent.io
• One-stop place to discover and
download :
• Connectors
• Transformations
• Converters
85. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Demo Time!
Customer
details
Kafka Connect
Debezium
86. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Do you think that’s a table
you are querying?
87. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
The Table Stream Duality
Account ID Balance
12345 €50
Account ID Amount
12345 + €50
12345 + €25
12345 -€60
Account ID Balance
12345 €75
Account ID Balance
12345 €15
Time
Stream Table
88. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
The truth is the log.
The database is a cache
of a subset of the log.
—Pat Helland
Immutability Changes Everything
http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
Photo by Bobby Burch on Unsplash
89. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Ac. ID Transaction ID Time ATM
A42 xxx116d91d6-ef17 11:56:58 Midland
A42 116d91d6-ef17 11:58:19 Halifax
A42 09c2f660-ef17 19:31:11 Lloyds
Spot patterns within this stream
Legit
Dodgy!
Legit
90. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Ac. ID T1 Time ATM T2 Time ATM
A42 11:56:58 Midland 11:58:19 Halifax
Suspect Transactions
Dodgy!
91. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Name Phone Ac. ID T1 Time ATM T2 Time ATM
Robin M 1234 567 A42 11:56:58 Midland 11:58:19 Halifax
Suspect Transactions
92. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Customer
details
ATM fraud txns
with customer
details
Elasticsearch
Notification
service
1. Spot fraud in stream of
transactions
2.Enrich transaction events
with customer data
93. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Customer
details
ATM fraud txns
with customer
details
Elasticsearch
Notification
service
1. Spot fraud in stream of
transactions
2.Enrich transaction events
with customer data
ATM_POSSIBLE_FRAUD_ENRICHED
atm_txns_gess
accounts
94. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
What can we do with it?
Photoby JoshuaRodriguez on Unsplash
95. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Realtime Operations View & Analysis
96. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Push notification to the customer
97. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Confluent Community Components
Apache Kafka with a bunch of cool stuff! For free!
Database Changes Log Events loT Data Web Events …
CRM
Data Warehouse
Database
Hadoop
Data
Integration
…
Monitoring
Analytics
Custom Apps
Transformations
Real-time Applications
…
Confluent Platform
Confluent Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Monitoring & Administration
Confluent Control Center | Security
Operations
Replicator | Auto Data Balancing
Development and Connectivity
Clients | Connectors | REST Proxy | CLI
SQL Stream Processing
KSQL
Datacenter Public Cloud Confluent Cloud
CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
98. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Free Books!
https://www.confluent.io/apache-kafka-stream-processing-book-bundle
100. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
• CDC Spreadsheet
• Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC
• #partner-engineering on Slack for questions
• BD team (#partners / partners@confluent.io) can help with introductions on a given sales op
Resources
#EOF