Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ATM Fraud Detection with Apache Kafka and KSQL

663 views

Published on

Detecting fraudulent activity in real time can save a business significant amounts of money, but has traditionally been an area requiring a lot of complex programming and frameworks, particularly at scale. Using KSQL, it's possible to use just SQL to build scalable real-time applications.

In this talk, we'll look at:

-What KSQL is, and how its ability to join streams of events can be used to detect possibly fraudulent activity based on a stream of ATM transactions.
-How easy it is to integrate Kafka with other systems—both upstream and downstream—using Kafka Connect to stream from a database into Kafka, and from Kafka into Elasticsearch.

* Join 7000 other Confluent Platform users on our Slack group: https://slackpass.io/confluentcommunity
* Find the code and instructions to run the entire demo: https://github.com/confluentinc/demo-scene/blob/master/ksql-atm-fraud-detection
* Get started with KSQL here: https://www.confluent.io/ksql

Published in: Technology
  • Be the first to comment

ATM Fraud Detection with Apache Kafka and KSQL

  1. 1. ATM Fraud Detection @rmoff with Apache Kafka
and KSQL Photoby FreddieCollins on Unsplash
  2. 2. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Spotting fraud in realtime hotoby MirzaBabic on Unsplash
  3. 3. Photoby LasayeHommes on Unsplash
  4. 4. ATM Fraud Detection with Apache Kafka and KSQL @rmoff • Account id • Location • Amount • Inbound stream of ATM data https://github.com/rmoff/gess
  5. 5. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Demo!
  6. 6. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Spot patterns within this stream Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds
  7. 7. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Spot patterns within this stream Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Legit Legit
  8. 8. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Spot patterns within this stream Legit Dodgy! Legit
  9. 9. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Spot patterns within this stream Legit Dodgy! Legit
  10. 10. ATM Fraud Detection with Apache Kafka and KSQL @rmoff • Account id • Location • Amount • Inbound stream of ATM data https://github.com/rmoff/gess
  11. 11. ATM Fraud Detection with Apache Kafka and KSQL @rmoff KSQL : Stream Processing with SQL TXN_ID, ATM, CUSTOMER_NAME, CUSTOMER_PHONE ATM_POSSIBLE_FRAUD; SELECT FROM
  12. 12. ATM Fraud Detection with Apache Kafka and KSQL @rmoff
  13. 13. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Customer details ATM fraud txns with customer details Elasticsearch Notification service 1. Spot fraud in stream of transactions 2.Enrich transaction events with customer data
  14. 14. ATM Fraud Detection with Apache Kafka and KSQL @rmoff KSQLis the Streaming SQL Engine for Apache Kafka
  15. 15. ATM Fraud Detection with Apache Kafka and KSQL @rmoff KSQL for Real-Time Monitoring • Log data monitoring, tracking and alerting • syslog data • Sensor / IoT data CREATE STREAM SYSLOG_INVALID_USERS AS SELECT HOST, MESSAGE FROM SYSLOG WHERE MESSAGE LIKE '%Invalid user%'; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
  16. 16. ATM Fraud Detection with Apache Kafka and KSQL @rmoff KSQL for Streaming ETL CREATE STREAM vip_actions AS 
 SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id 
 WHERE u.level = 'Platinum'; Joining, filtering, and aggregating streams of event data
  17. 17. ATM Fraud Detection with Apache Kafka and KSQL @rmoff KSQL for Anomaly Detection CREATE TABLE possible_fraud AS
 SELECT card_number, count(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 SECONDS)
 GROUP BY card_number
 HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds
  18. 18. ATM Fraud Detection with Apache Kafka and KSQL @rmoff CREATE STREAM pageviews WITH (PARTITIONS=4, VALUE_FORMAT='AVRO') AS 
 SELECT * FROM pageviews_json; KSQL for Data Transformation Make simple derivations of existing topics from the command line
  19. 19. ATM Fraud Detection with Apache Kafka and KSQL @rmoff KSQL in Development and Production Interactive KSQL
 for development and testing Headless KSQL
 for Production Desired KSQL queries have been identified REST “Hmm, let me try
 out this idea...”
  20. 20. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Stream Stream joins Orders Shipments Which orders haven't shipped? order.id = shipment.order_id Leadtime shipment_ts - order_ts
  21. 21. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Stream Stream joins ATM transactions
  22. 22. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Stream Stream joins ATM transactions
  23. 23. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Demo!
  24. 24. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Self-Join (Cartesian product) Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds T Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds T1 T2
  25. 25. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds T1 T2 Self-Join (Cartesian product) ATM_TXNS T1 INNER JOIN ATM_TXNS T2 ON T1.ACCOUNT_ID = T2.ACCOUNT_ID
  26. 26. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Self-Join (Cartesian product) FROM ATM_TXNS T1 INNER JOIN ATM_TXNS T2 WITHIN 10 MINUTES ON T1.ACCOUNT_ID = T2.ACCOUNT_ID Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds T1 T2
  27. 27. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Self-Join T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM xxx116d91d6-ef17 116d91d6-ef17 11:56:58 11:58:19 Midland Halifax 116d91d6-ef17 xxx116d91d6-ef17 11:58:19 11:56:58 Halifax Midland xxx116d91d6-ef17 xxx116d91d6-ef17 11:56:58 11:56:58 Midland Midland 116d91d6-ef17 116d91d6-ef17 11:58:19 11:58:19 Halifax Halifax
  28. 28. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Self-Join T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM xxx116d91d6-ef17 116d91d6-ef17 11:56:58 11:58:19 Midland Halifax 116d91d6-ef17 xxx116d91d6-ef17 11:58:19 11:56:58 Halifax Midland xxx116d91d6-ef17 xxx116d91d6-ef17 11:56:58 11:56:58 Midland Midland 116d91d6-ef17 116d91d6-ef17 11:58:19 11:58:19 Halifax Halifax Self join on same txn IDs
  29. 29. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Exclude joins on the same txn WHERE T1.TRANSACTION_ID != T2.TRANSACTION_ID T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM xxx116d91d6-ef17 116d91d6-ef17 11:56:58 11:58:19 Midland Halifax 116d91d6-ef17 xxx116d91d6-ef17 11:58:19 11:56:58 Halifax Midland
  30. 30. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Exclude joins on the same txn T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM xxx116d91d6-ef17 116d91d6-ef17 11:56:58 11:58:19 Midland Halifax 116d91d6-ef17 xxx116d91d6-ef17 11:58:19 11:56:58 Halifax Midland Duplicate results (A:B / B:A)
  31. 31. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Join Windows Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds T1 T2 WITHIN 10 MINUTES WHERE T1.TRANSACTION_ID != T2.TRANSACTION_ID
  32. 32. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Join Windows Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds T1 T2 WITHIN 10 MINUTES WHERE T1.TRANSACTION_ID != T2.TRANSACTION_ID
  33. 33. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Join Windows Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds T1 T2 WITHIN 10 MINUTES WHERE T1.TRANSACTION_ID != T2.TRANSACTION_ID
  34. 34. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Only join forward Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds T1 T2 WITHIN (0 MINUTES, 10 MINUTES) WHERE T1.TRANSACTION_ID != T2.TRANSACTION_ID
  35. 35. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Only join forward Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds T1 T2 WITHIN (0 MINUTES, 10 MINUTES) WHERE T1.TRANSACTION_ID != T2.TRANSACTION_ID
  36. 36. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Only join forward T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM xxx116d91d6-ef17 116d91d6-ef17 11:56:58 11:58:19 Midland Halifax WITHIN (0 MINUTES, 10 MINUTES) Ignore events in the right-hand stream prior to those in the left
  37. 37. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Only join forward T1 Txn ID T2 Txn ID T1 Time T2 Time T1 ATM T2 ATM xxx116d91d6-ef17 116d91d6-ef17 11:56:58 11:58:19 Midland Halifax Legit Dodgy!
  38. 38. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Photoby EstebanLopez on Unsplash
  39. 39. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Calcuate distance between ATMs GEO_DISTANCE(TX1.location->lat, TX1.location->lon, TX2.location->lat, TX2.location->lon, 'KM') TX1 TX2
  40. 40. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Calculate time between transactions TX2.ROWTIME - TX1.ROWTIME AS MILLISECONDS_DIFFERENCE (TX2.ROWTIME - TX1.ROWTIME) / 1000 / 60 / 60 AS HOURS_DIFFERENCE
  41. 41. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Photoby EstebanLopez on Unsplash GEO_DISTANCE(…) / HOURS_DIFFERENCE AS KMH_REQUIRED
  42. 42. ATM Fraud Detection with Apache Kafka and KSQL @rmoff So speaking of time… ksql> PRINT 'atm_txns_gess' ; Format:JSON { "ROWTIME": 1544116309152, "ROWKEY": "null", "account_id": "a218", "timestamp": "2018-12-06 17:09:58 +0000", "atm": "HSBC", …} Kafka message timestamp 2018-12-06 17:11:49 Event time
  43. 43. ksql> PRINT 'atm_txns_gess' ; Format:JSON { "ROWTIME": 1544116309152, "ROWKEY": "null", "account_id": "a218", "timestamp": "2018-12-06 17:09:58 +0000", CREATE STREAM ATM_TXNS_GESS (account_id VARCHAR, timestamp VARCHAR, … WITH (KAFKA_TOPIC='atm_txns_gess', TIMESTAMP='timestamp', TIMESTAMP_FORMAT= 'yyyy-MM-dd HH:mm:ss X'); "timestamp": "2018-12-06 17:09:58 +0000",
  44. 44. ATM Fraud Detection with Apache Kafka and KSQL @rmoff But what about the account holder? !
  45. 45. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Photoby SamuelZeller on Unsplash
  46. 46. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Customer details ATM fraud txns with customer details Elasticsearch Notification service 1. Enrich transaction events with customer data
  47. 47. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sourcessyslog flat file CSV JSON MQTT
  48. 48. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sinks Amazon S3 MQTT
  49. 49. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sources Sinks Amazon S3 MQTT syslog flat file CSV JSON MQTT
  50. 50. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Confluent Hub hub.confluent.io • One-stop place to discover and download : • Connectors • Transformations • Converters
  51. 51. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Demo Time! Customer details Kafka Connect Debezium
  52. 52. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Do you think that’s a table you are querying?
  53. 53. ATM Fraud Detection with Apache Kafka and KSQL @rmoff The Table Stream Duality Account ID Balance 12345 €50 Account ID Amount 12345 + €50 12345 + €25 12345 -€60 Account ID Balance 12345 €75 Account ID Balance 12345 €15 Time Stream Table
  54. 54. ATM Fraud Detection with Apache Kafka and KSQL @rmoff The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Photo by Bobby Burch on Unsplash
  55. 55. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Ac. ID Transaction ID Time ATM A42 xxx116d91d6-ef17 11:56:58 Midland A42 116d91d6-ef17 11:58:19 Halifax A42 09c2f660-ef17 19:31:11 Lloyds Spot patterns within this stream Legit Dodgy! Legit
  56. 56. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Ac. ID T1 Time ATM T2 Time ATM A42 11:56:58 Midland 11:58:19 Halifax Suspect Transactions Dodgy!
  57. 57. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Name Phone Ac. ID T1 Time ATM T2 Time ATM Robin M 1234 567 A42 11:56:58 Midland 11:58:19 Halifax Suspect Transactions
  58. 58. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Customer details ATM fraud txns with customer details Elasticsearch Notification service 1. Spot fraud in stream of transactions 2.Enrich transaction events with customer data
  59. 59. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Customer details ATM fraud txns with customer details Elasticsearch Notification service 1. Spot fraud in stream of transactions 2.Enrich transaction events with customer data ATM_POSSIBLE_FRAUD_ENRICHED atm_txns_gess accounts
  60. 60. ATM Fraud Detection with Apache Kafka and KSQL @rmoff What can we do with it? Photoby JoshuaRodriguez on Unsplash
  61. 61. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Realtime Operations View & Analysis
  62. 62. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Push notification to the customer
  63. 63. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Confluent Community Components Apache Kafka with a bunch of cool stuff! For free! Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data
 Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Confluent Platform Confluent Platform Apache Kafka® Core | Connect API | Streams API Data Compatibility Schema Registry Monitoring & Administration Confluent Control Center | Security Operations Replicator | Auto Data Balancing Development and Connectivity Clients | Connectors | REST Proxy | CLI SQL Stream Processing KSQL Datacenter Public Cloud Confluent Cloud CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
  64. 64. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Free Books! https://www.confluent.io/apache-kafka-stream-processing-book-bundle
  65. 65. @rmoff robin@confluent.io https://www.confluent.io/ksql http://cnfl.io/slack https://cnfl.io/demo-scene

×