1
Apache Kafka an Open Source
Event Streaming Platform
Erfassung, Analyse und Auswertung von Datenströmen in Echtzeit
22
Introduction
Event Streaming
3
ETL/Data Integration Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Highly Scalable
Durable
Persistent
Ordered
Real-time
44
Highly Scalable
Persistent
ETL/Data Integration MessagingETL/Data Integration MessagingMessaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Real-timeHighly Scalable
Durable
Persistent
Ordered
Real-time
Event Streaming
55
Highly Scalable
Durable
Persistent
Maintains Order
ETL/Data Integration MessagingETL/Data Integration MessagingMessaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Fast (Low Latency)Highly Scalable
Durable
Persistent
Ordered
Real-time
Event Streaming
What happened
in the world
(stored records)
What is happening
in the world
(transient messages)
What is contextually happening in the world (data
as a continually updating stream of events)
66
Event-Driven App
(Location Tracking)
Only Real-Time Events
Messaging Queues and
Event Streaming
Platforms can do this
Contextual
Event-Driven App
(ETA)
Real-Time combined
with stored data
Only Event Streaming
Platforms can do this
Where is my driver? When will my driver
get here?
Where is my driver? When will my driver
get here?
2
min
Why Combine Real-time
With Historical Context?
77
Event Streaming Paradigm
Highly Scalable
Durable
Persistent
Maintains Order
Fast (Low Latency)
Event Streaming
88
STREAM
PROCESSING
Create and store
materialized views
Filter
Analyze in-flight
9C O N F I D E N T I A L
Apache Kafka, the de-facto OSS standard for
event streaming
Real-time | Uses disk structure for constant performance at Petabyte scale
Scalable | Distributed, scales quickly and easily without downtime
Persistent | Persists messages on disks, enables intra-cluster replication
Reliable | Replicates data, auto balances consumers upon failure
In production at more
than a third of the
Fortune 500
2 trillion messages a
day at LinkedIn
500 billion events a
day (1.3 PB) at Netflix
10C O N F I D E N T I A L 10C O N F I D E N T I A L
About Confluent We Are The Kafka Experts
30% of Fortune 100
Confluent founders
created Kafka
Confluent team wrote
80% of Kafka
We have over 300,000
hours of Kafka Experience
11C O N F I D E N T I A L
Kafka Integration Architecture
PRODUCERCONSUMER
12C O N F I D E N T I A L
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
Stream Processing Analogy
13C O N F I D E N T I A L
KSQLis the
Streaming
SQL Enginefor
Apache Kafka
14C O N F I D E N T I A L
CREATE STREAM ATM_POSSIBLE_FRAUD_ENRICHED AS
SELECT t.account_id,
a.first_name + ’ ’ + a.last_name cust_name,
t.atm, t.amount,
TIMESTAMPTOSTRING(t.ROWTIME,’HH:mm:ss’) tx_time
FROM atm_txns t
INNER JOIN accounts a
ON t.account_id = a.account_id;
Simple SQL syntax for expressing reasoning along and across data streams.
You can write user-defined functions in Java
Stream processing with KSQL
15C O N F I D E N T I A L
KSQL in Development and Production
Interactive KSQL
for development and testing
Headless KSQL
for Production
Desired KSQL queries
have been identified
REST
“Hmm, let me try
out this idea...”
16C O N F I D E N T I A L
ATM Fraud Dataflow: Streaming ETL with KSQL
17C O N F I D E N T I A L
What does KSQL look like?
● First load a topic into a stream
CREATE STREAM ATM_TXNS_GESS (account_id VARCHAR,
atm VARCHAR,
location STRUCT<lon DOUBLE, lat DOUBLE>,
amount INT,
timestamp VARCHAR,
transaction_id VARCHAR)
WITH (KAFKA_TOPIC='atm_txns_gess', VALUE_FORMAT='JSON‘,
TIMESTAMP='timestamp‘,
TIMESTAMP_FORMAT='yyyy-MM-dd HH:mm:ss X‘);
18C O N F I D E N T I A L
What does KSQL look like?
● Create a table on topic for reference data
● Join stream to table for enrichment
CREATE STREAM ATM_POSSIBLE_FRAUD_ENRICHED AS
SELECT T.ACCOUNT_ID AS ACCOUNT_ID, T.TX1_TIMESTAMP,
T.TX2_TIMESTAMP, T.TX1_AMOUNT, T.TX2_AMOUNT,
T.TX1_ATM, T.TX2_ATM, T.TX1_LOCATION, T.TX2_LOCATION,
T.TX1_TRANSACTION_ID, T.TX2_TRANSACTION_ID,
T.DISTANCE_BETWEEN_TXN_KM, T.MILLISECONDS_DIFFERENCE,
T.MINUTES_DIFFERENCE, T.KMH_REQUIRED,
A.FIRST_NAME + ' ‚ + A.LAST_NAME AS CUSTOMER_NAME,
A.EMAIL AS CUSTOMER_EMAIL, A.PHONE AS CUSTOMER_PHONE,
A.ADDRESS AS CUSTOMER_ADDRESS, A.COUNTRY AS CUSTOMER_COUNTRY
FROM ATM_POSSIBLE_FRAUD T
INNER JOIN ACCOUNTS A
ON T.ACCOUNT_ID = A.ACCOUNT_ID;
CREATE TABLE ACCOUNTS
WITH (KAFKA_TOPIC='ACCOUNTS',VALUE_FORMAT='AVRO',KEY='ACCOUNT_ID');
1919
Demo!
20C O N F I D E N T I A L
Or use the Kafka Streams API
● Java or Scala
● Can do multiple joins in one operation
● Provides an interactive query API which makes it possible to query the state
store.
ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Confluent Hub
hub.confluent.io
One-stop place to discover and download :
• Connectors
• Transformations
• Converters
22
Realtime Operations View & Analysis
23Confluent Community - What next?
About 10,000 Kafkateers are
collaborating every single day on the
Confluent Community Slack channel!
There are more than 35,000 Kafkateers
in around 145 meetup groups across all
five continents!
Join the Confluent Community
Slack Channel
Join your local Apache Kafka®
Meetup
Get frequent updates from key names in
Apache Kafka® on best practices,
product updates & more!
Subscribe to the
Confluent blog
cnfl.io/community-slack cnfl.io/meetups cnfl.io/read
Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no
affiliation with and does not endorse the materials provided at this event.
24
NOMINATE YOURSELF OR A PEER AT
CONFLUENT.IO/NOMINATE
25
KS19Meetup.
CONFLUENT COMMUNITY DISCOUNT CODE
25% OFF*
*Standard Priced Conference pass
ATM Fraud Detection with Apache Kafka and KSQL
@rmoff

Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190708v01

  • 1.
    1 Apache Kafka anOpen Source Event Streaming Platform Erfassung, Analyse und Auswertung von Datenströmen in Echtzeit
  • 2.
  • 3.
    3 ETL/Data Integration Messaging Batch Expensive TimeConsuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Real-time
  • 4.
    44 Highly Scalable Persistent ETL/Data IntegrationMessagingETL/Data Integration MessagingMessaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Real-timeHighly Scalable Durable Persistent Ordered Real-time Event Streaming
  • 5.
    55 Highly Scalable Durable Persistent Maintains Order ETL/DataIntegration MessagingETL/Data Integration MessagingMessaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Fast (Low Latency)Highly Scalable Durable Persistent Ordered Real-time Event Streaming What happened in the world (stored records) What is happening in the world (transient messages) What is contextually happening in the world (data as a continually updating stream of events)
  • 6.
    66 Event-Driven App (Location Tracking) OnlyReal-Time Events Messaging Queues and Event Streaming Platforms can do this Contextual Event-Driven App (ETA) Real-Time combined with stored data Only Event Streaming Platforms can do this Where is my driver? When will my driver get here? Where is my driver? When will my driver get here? 2 min Why Combine Real-time With Historical Context?
  • 7.
    77 Event Streaming Paradigm HighlyScalable Durable Persistent Maintains Order Fast (Low Latency) Event Streaming
  • 8.
  • 9.
    9C O NF I D E N T I A L Apache Kafka, the de-facto OSS standard for event streaming Real-time | Uses disk structure for constant performance at Petabyte scale Scalable | Distributed, scales quickly and easily without downtime Persistent | Persists messages on disks, enables intra-cluster replication Reliable | Replicates data, auto balances consumers upon failure In production at more than a third of the Fortune 500 2 trillion messages a day at LinkedIn 500 billion events a day (1.3 PB) at Netflix
  • 10.
    10C O NF I D E N T I A L 10C O N F I D E N T I A L About Confluent We Are The Kafka Experts 30% of Fortune 100 Confluent founders created Kafka Confluent team wrote 80% of Kafka We have over 300,000 hours of Kafka Experience
  • 11.
    11C O NF I D E N T I A L Kafka Integration Architecture PRODUCERCONSUMER
  • 12.
    12C O NF I D E N T I A L Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt Stream Processing Analogy
  • 13.
    13C O NF I D E N T I A L KSQLis the Streaming SQL Enginefor Apache Kafka
  • 14.
    14C O NF I D E N T I A L CREATE STREAM ATM_POSSIBLE_FRAUD_ENRICHED AS SELECT t.account_id, a.first_name + ’ ’ + a.last_name cust_name, t.atm, t.amount, TIMESTAMPTOSTRING(t.ROWTIME,’HH:mm:ss’) tx_time FROM atm_txns t INNER JOIN accounts a ON t.account_id = a.account_id; Simple SQL syntax for expressing reasoning along and across data streams. You can write user-defined functions in Java Stream processing with KSQL
  • 15.
    15C O NF I D E N T I A L KSQL in Development and Production Interactive KSQL for development and testing Headless KSQL for Production Desired KSQL queries have been identified REST “Hmm, let me try out this idea...”
  • 16.
    16C O NF I D E N T I A L ATM Fraud Dataflow: Streaming ETL with KSQL
  • 17.
    17C O NF I D E N T I A L What does KSQL look like? ● First load a topic into a stream CREATE STREAM ATM_TXNS_GESS (account_id VARCHAR, atm VARCHAR, location STRUCT<lon DOUBLE, lat DOUBLE>, amount INT, timestamp VARCHAR, transaction_id VARCHAR) WITH (KAFKA_TOPIC='atm_txns_gess', VALUE_FORMAT='JSON‘, TIMESTAMP='timestamp‘, TIMESTAMP_FORMAT='yyyy-MM-dd HH:mm:ss X‘);
  • 18.
    18C O NF I D E N T I A L What does KSQL look like? ● Create a table on topic for reference data ● Join stream to table for enrichment CREATE STREAM ATM_POSSIBLE_FRAUD_ENRICHED AS SELECT T.ACCOUNT_ID AS ACCOUNT_ID, T.TX1_TIMESTAMP, T.TX2_TIMESTAMP, T.TX1_AMOUNT, T.TX2_AMOUNT, T.TX1_ATM, T.TX2_ATM, T.TX1_LOCATION, T.TX2_LOCATION, T.TX1_TRANSACTION_ID, T.TX2_TRANSACTION_ID, T.DISTANCE_BETWEEN_TXN_KM, T.MILLISECONDS_DIFFERENCE, T.MINUTES_DIFFERENCE, T.KMH_REQUIRED, A.FIRST_NAME + ' ‚ + A.LAST_NAME AS CUSTOMER_NAME, A.EMAIL AS CUSTOMER_EMAIL, A.PHONE AS CUSTOMER_PHONE, A.ADDRESS AS CUSTOMER_ADDRESS, A.COUNTRY AS CUSTOMER_COUNTRY FROM ATM_POSSIBLE_FRAUD T INNER JOIN ACCOUNTS A ON T.ACCOUNT_ID = A.ACCOUNT_ID; CREATE TABLE ACCOUNTS WITH (KAFKA_TOPIC='ACCOUNTS',VALUE_FORMAT='AVRO',KEY='ACCOUNT_ID');
  • 19.
  • 20.
    20C O NF I D E N T I A L Or use the Kafka Streams API ● Java or Scala ● Can do multiple joins in one operation ● Provides an interactive query API which makes it possible to query the state store.
  • 21.
    ATM Fraud Detectionwith Apache Kafka and KSQL @rmoff Confluent Hub hub.confluent.io One-stop place to discover and download : • Connectors • Transformations • Converters
  • 22.
  • 23.
    23Confluent Community -What next? About 10,000 Kafkateers are collaborating every single day on the Confluent Community Slack channel! There are more than 35,000 Kafkateers in around 145 meetup groups across all five continents! Join the Confluent Community Slack Channel Join your local Apache Kafka® Meetup Get frequent updates from key names in Apache Kafka® on best practices, product updates & more! Subscribe to the Confluent blog cnfl.io/community-slack cnfl.io/meetups cnfl.io/read Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no affiliation with and does not endorse the materials provided at this event.
  • 24.
    24 NOMINATE YOURSELF ORA PEER AT CONFLUENT.IO/NOMINATE
  • 25.
    25 KS19Meetup. CONFLUENT COMMUNITY DISCOUNTCODE 25% OFF* *Standard Priced Conference pass
  • 26.
    ATM Fraud Detectionwith Apache Kafka and KSQL @rmoff