SlideShare a Scribd company logo
Real Time
Anomaly Detection
Patterns and reference architectures
Gwen Shapira, System Architect
©2014 Cloudera, Inc. All rights reserved.
Overview
• Intro
• Review Problem
• Quick overview of key technology
• High level architecture
• Deep Dive into NRT Processing
• Completing the Puzzle – Micro-batch, Ingest and Batch
©2014 Cloudera, Inc. All rights reserved.
Gwen Shapira
• 15 years of moving data
• Formerly consultant, engineer
• System Architect @ Confluent
• Kafka Committer
• @gwenshap
There’s a Book on That
Founded by creators of Kafka - @jaykreps, @nehanarkhede, @junrao
We help you gather, transport, organize, and analyze all of your stream data
What we offer
• Confluent Platform
• Kafka plus critical bug fixes not yet applied in Apache release
• Kafka ecosystem projects
• Enterprise support
• Training and Professional Services
©2014 Cloudera, Inc. All rights reserved.
The Problem
©2014 Cloudera, Inc. All rights reserved.
Credit Card Transaction Fraud
©2014 Cloudera, Inc. All rights reserved.
Coupon Fraud
©2014 Cloudera, Inc. All rights reserved.
Video Game Strategy
©2014 Cloudera, Inc. All rights reserved.
Health Insurance Fraud
©2014 Cloudera, Inc. All rights reserved.
How do we React
• Human Brain at Tennis
• Muscle Memory
• Reaction Thought
• Reflective Meditation
©2014 Cloudera, Inc. All rights reserved.
Overview of
Key Technologies
©2014 Cloudera, Inc. All Rights Reserved.
Kafka
©2014 Cloudera, Inc. All rights reserved.
The Basics
• Messages are organized into topics
• Producers push messages
• Consumers pull messages
• Kafka runs in a cluster. Nodes are called brokers
©2014 Cloudera, Inc. All rights reserved.
Topics, Partitions and Logs
©2014 Cloudera, Inc. All rights reserved.
Each partition is a log
©2014 Cloudera, Inc. All rights reserved.
Each Broker has many partitions
Partition 0 Partition 0
Partition 1 Partition 1
Partition 2
Partition 1
Partition 0
Partition 2 Partion 2
©2014 Cloudera, Inc. All rights reserved.
Producers load balance between partitions
Partition 0
Partition 1
Partition 2
Partition 1
Partition 0
Partition 2
Partition 0
Partition 1
Partion 2
Client
©2014 Cloudera, Inc. All rights reserved.
Producers load balance between partitions
Partition 0
Partition 1
Partition 2
Partition 1
Partition 0
Partition 2
Partition 0
Partition 1
Partion 2
Client
Consumers
Consumer Group Y
Consumer Group X
Consumer
Kafka Cluster
Topic
Partition A (File)
Partition B (File)
Partition C (File)
Consumer
Consumer
Consumer
Order retained with in
partition
Order retained with in
partition but not over
partitionsOffSetX
OffSetX
OffSetX
OffSetYOffSetYOffSetY
Off sets are kept per
consumer group
Consumer-Producer Pattern
Keeping Things Simple
• Consume records from Kafka Topic
• Filter, transform, join, lookups, aggregate
• Write to another Kafka Topic
• https://github.com/confluentinc/examples/tree/master/specific-avro-
consumer
Kafka Makes Streams Easy
• Producers partition the data
• Consumers load balance partitions
• Add / remove consumers any way you want
• Will work with any framework (or none!)
Coming Soon to Kafka Near You
• KafkaConnect - Export / Import for Kafka - 0.9.0 (Its here!)
• KStream
• Consumer-Producer client - Processor (0.10.0 - April?)
• DSLs:
• KStream (a bit like Spark) - (0.10.0 - April?)
• SQL - ???
KConnect - Its a thing
• Easy to add connectors to Kafka
• Existing connectors
• JDBC
• HDFS
• MySQL * 2
• ElasticSearch * 4
• Cassandra
• S3 * 2
• MQTT
• Twitter
• Kafka Connectors:
• http://www.confluent.io/developers/connectors
• http://docs.confluent.io/2.0.0/connect/index.html
• KStreams:
• https://github.com/gwenshap/kafka-
examples/blob/master/KafkaStreamsAvg
SparkStreaming
©2014 Cloudera, Inc. All rights reserved.
Spark Example
1. val conf = new SparkConf().setMaster("local[2]”)
2. val sc = new SparkContext(conf)
3. val lines = sc.textFile(path, 2)
4. val words = lines.flatMap(_.split(" "))
5. val pairs = words.map(word => (word, 1))
6. val wordCounts = pairs.reduceByKey(_ + _)
7. wordCounts.print()
©2014 Cloudera, Inc. All rights reserved.
Spark Streaming Example
1. val conf = new SparkConf().setMaster("local[2]”)
2. val ssc = new StreamingContext(conf, Seconds(1))
3. val lines = ssc.socketTextStream("localhost", 9999)
4. val words = lines.flatMap(_.split(" "))
5. val pairs = words.map(word => (word, 1))
6. val wordCounts = pairs.reduceByKey(_ + _)
7. wordCounts.print()
8. SSC.start()
Spark Streaming
Confidentiality Information Goes Here
DStream
DStream
DStream
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count Print
Source Receiver RDD
RDD
RDD
Single Pass
Filter Count Print
Pre-first
Batch
First
Batch
Second
Batch
Confidentiality Information Goes Here
DStream
DStream
DStream
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count
Print
Source Receiver RDD
RDD
RDD
Single Pass
Filter Count
Pre-first
Batch
First
Batch
Second
Batch
Stateful RDD 1
Print
Stateful RDD 2
Stateful RDD 1
©2014 Cloudera, Inc. All rights reserved.
High Level Architecture
©2014 Cloudera, Inc. All rights reserved.
Real-Time Event Processing Approach
Hadoop Cluster II
Storage Processing
SolR
Hadoop Cluster I
ClientClient
Flume Agents
Hbase /
Memory
Spark
Streaming
HDFS
Hive/Impa
la
Map/Redu
ce
Spark
Search
Automated &
Manual
Analytical
Adjustments
and Pattern
detection
Fetching &
Updating Profiles
HDFSEventSink
SolR Sink
Batch Time Adjustments
Automated &
Manual
Review of NRT
Changes and
Counters
Local Cache
Kafka
Clients:
(Swipe here!)
Web App
Adjust NRT Statistics
Yarn / Mesos
Analytics Layer
SolR
Client
Client
KStreams
Analytical
Adjustments
and Pattern
detection
Fetching &
Updating Profiles
Adjusting NRT Stats Batch Time Adjustments
Review of
NRT
Changes and
Counters
Local Cache
Kafka
Clients:
(Swipe
here!)
Web App Kafka
HDFS
NoSQL
DWH
Connecor
Connector
KStream
Processor
Profile Updates
Model Updates
Transactions
Local
Store
Decisions
DWH
RedoLog
KStream
Processor
KStream
Processor
©2014 Cloudera, Inc. All rights reserved.
NRT Processing
©2014 Cloudera, Inc. All rights reserved.
Focus on NRT First
Hadoop Cluster II
Storage Processing
SolR
Hadoop Cluster I
ClientClient
Processor
Hbase /
Memory
Spark
Streaming
HDFS
Hive/Impa
la
Map/Redu
ce
Spark
Search
Automated &
Manual
Analytical
Adjustments
and Pattern
detection
Fetching &
Updating Profiles
HDFSEventSink
SolR Sink
Batch Time Adjustments
Automated &
Manual
Review of NRT
Changes and
Counters
Local Cache
Kafka
Clients:
(Swipe here!)
Web App
Adjust NRT Statistics
©2014 Cloudera, Inc. All rights reserved.
Streaming Architecture – NRT Event
Processing
Kafka
Initial Events Topic
Event Processing Logic
Local
Memory
HBase
Client
Kafka
Answer Topic
HBase
KafkaConsumer
KafkaProducer
Able to respond with in
10s of milliseconds
©2014 Cloudera, Inc. All rights reserved.
Partitioned NRT Event
Processing
Kafka
Initial Events Topic
Event Processing Logic
Local
Cache
HBase
Client
Kafka
Answer Topic
HBase
KafkaConsumer
KafkaProducer
Topic
Partition A
Partition B
Partition C
Producer
Partitioner
Producer
Partitioner
Producer
Partitioner Custom Partitioner
Better use of local
memory
©2014 Cloudera, Inc. All rights reserved.
Questions?
http://confluent.io
@confluentInc
@gwenshap
gwen@confluent.io

More Related Content

What's hot

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Amazon Web Services
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
DataWorks Summit
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
Gwen (Chen) Shapira
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Real time analytics with Kafka and SparkStreaming
Real time analytics with Kafka and SparkStreamingReal time analytics with Kafka and SparkStreaming
Real time analytics with Kafka and SparkStreaming
Ashish Singh
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
Hari Shreedharan
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
Edureka!
 
Spark streaming and Kafka
Spark streaming and KafkaSpark streaming and Kafka
Spark streaming and Kafka
Iraj Hedayati
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
mattlieber
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Shravan (Sean) Pabba
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
Guido Schmutz
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
datamantra
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
P. Taylor Goetz
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
fluentd -- the missing log collector
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collector
Muga Nishizawa
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
W2O Group
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
Rahul Jain
 
Data Pipeline with Kafka
Data Pipeline with KafkaData Pipeline with Kafka
Data Pipeline with Kafka
Peerapat Asoktummarungsri
 

What's hot (20)

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Real time analytics with Kafka and SparkStreaming
Real time analytics with Kafka and SparkStreamingReal time analytics with Kafka and SparkStreaming
Real time analytics with Kafka and SparkStreaming
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
 
Spark streaming and Kafka
Spark streaming and KafkaSpark streaming and Kafka
Spark streaming and Kafka
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
 
fluentd -- the missing log collector
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collector
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Data Pipeline with Kafka
Data Pipeline with KafkaData Pipeline with Kafka
Data Pipeline with Kafka
 

Viewers also liked

Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
Gwen (Chen) Shapira
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
Gwen (Chen) Shapira
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Gwen (Chen) Shapira
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
Gwen (Chen) Shapira
 
Fraud Detection with Hadoop
Fraud Detection with HadoopFraud Detection with Hadoop
Fraud Detection with Hadoop
markgrover
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
Gwen (Chen) Shapira
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
hadooparchbook
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...
Thomas Alex
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
Luis Gonzalez
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
Attunity
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
Cloudera, Inc.
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
confluent
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoopskaluska
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detection
Mk Kim
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
 

Viewers also liked (20)

Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Fraud Detection with Hadoop
Fraud Detection with HadoopFraud Detection with Hadoop
Fraud Detection with Hadoop
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoop
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detection
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
 

Similar to Fraud Detection for Israel BigThings Meetup

Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
hadooparchbook
 
Architecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopArchitecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with Hadoop
DataWorks Summit
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with KafkaAvoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
HostedbyConfluent
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
Grant Henke
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2
abramsm
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Cloudera, Inc.
 
Fast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache PulsarFast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache Pulsar
Timothy Spann
 
WebRTC Webinar & Q&A - Sumilcast Standards & Implementation
WebRTC Webinar & Q&A - Sumilcast Standards & ImplementationWebRTC Webinar & Q&A - Sumilcast Standards & Implementation
WebRTC Webinar & Q&A - Sumilcast Standards & Implementation
Amir Zmora
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Data Con LA
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
Jack Gudenkauf
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Kubermatic.pdf
Kubermatic.pdfKubermatic.pdf
Kubermatic.pdf
LibbySchulze
 
Kubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdfKubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdf
LibbySchulze
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
confluent
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
Spark etl
Spark etlSpark etl
Spark etl
Imran Rashid
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Timothy Spann
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 

Similar to Fraud Detection for Israel BigThings Meetup (20)

Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
 
Architecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopArchitecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with Hadoop
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with KafkaAvoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
 
Fast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache PulsarFast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache Pulsar
 
WebRTC Webinar & Q&A - Sumilcast Standards & Implementation
WebRTC Webinar & Q&A - Sumilcast Standards & ImplementationWebRTC Webinar & Q&A - Sumilcast Standards & Implementation
WebRTC Webinar & Q&A - Sumilcast Standards & Implementation
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Kubermatic.pdf
Kubermatic.pdfKubermatic.pdf
Kubermatic.pdf
 
Kubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdfKubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdf
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
Spark etl
Spark etlSpark etl
Spark etl
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 

More from Gwen (Chen) Shapira

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep Dive
Gwen (Chen) Shapira
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Gwen (Chen) Shapira
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
Gwen (Chen) Shapira
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Gwen (Chen) Shapira
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
Gwen (Chen) Shapira
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
Gwen (Chen) Shapira
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
Gwen (Chen) Shapira
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureGwen (Chen) Shapira
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
Gwen (Chen) Shapira
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
 
Scaling etl with hadoop shapira 3
Scaling etl with hadoop   shapira 3Scaling etl with hadoop   shapira 3
Scaling etl with hadoop shapira 3Gwen (Chen) Shapira
 
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
Gwen (Chen) Shapira
 
Visualizing database performance hotsos 13-v2
Visualizing database performance   hotsos 13-v2Visualizing database performance   hotsos 13-v2
Visualizing database performance hotsos 13-v2Gwen (Chen) Shapira
 

More from Gwen (Chen) Shapira (16)

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep Dive
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
Scaling etl with hadoop shapira 3
Scaling etl with hadoop   shapira 3Scaling etl with hadoop   shapira 3
Scaling etl with hadoop shapira 3
 
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
 
Ssd collab13
Ssd   collab13Ssd   collab13
Ssd collab13
 
Integrated dwh 3
Integrated dwh 3Integrated dwh 3
Integrated dwh 3
 
Visualizing database performance hotsos 13-v2
Visualizing database performance   hotsos 13-v2Visualizing database performance   hotsos 13-v2
Visualizing database performance hotsos 13-v2
 
Flexible Design
Flexible DesignFlexible Design
Flexible Design
 

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

Fraud Detection for Israel BigThings Meetup

  • 1. Real Time Anomaly Detection Patterns and reference architectures Gwen Shapira, System Architect
  • 2. ©2014 Cloudera, Inc. All rights reserved. Overview • Intro • Review Problem • Quick overview of key technology • High level architecture • Deep Dive into NRT Processing • Completing the Puzzle – Micro-batch, Ingest and Batch
  • 3. ©2014 Cloudera, Inc. All rights reserved. Gwen Shapira • 15 years of moving data • Formerly consultant, engineer • System Architect @ Confluent • Kafka Committer • @gwenshap
  • 5. Founded by creators of Kafka - @jaykreps, @nehanarkhede, @junrao We help you gather, transport, organize, and analyze all of your stream data What we offer • Confluent Platform • Kafka plus critical bug fixes not yet applied in Apache release • Kafka ecosystem projects • Enterprise support • Training and Professional Services
  • 6. ©2014 Cloudera, Inc. All rights reserved. The Problem
  • 7. ©2014 Cloudera, Inc. All rights reserved. Credit Card Transaction Fraud
  • 8. ©2014 Cloudera, Inc. All rights reserved. Coupon Fraud
  • 9. ©2014 Cloudera, Inc. All rights reserved. Video Game Strategy
  • 10. ©2014 Cloudera, Inc. All rights reserved. Health Insurance Fraud
  • 11. ©2014 Cloudera, Inc. All rights reserved. How do we React • Human Brain at Tennis • Muscle Memory • Reaction Thought • Reflective Meditation
  • 12. ©2014 Cloudera, Inc. All rights reserved. Overview of Key Technologies
  • 13. ©2014 Cloudera, Inc. All Rights Reserved. Kafka
  • 14. ©2014 Cloudera, Inc. All rights reserved. The Basics • Messages are organized into topics • Producers push messages • Consumers pull messages • Kafka runs in a cluster. Nodes are called brokers
  • 15. ©2014 Cloudera, Inc. All rights reserved. Topics, Partitions and Logs
  • 16. ©2014 Cloudera, Inc. All rights reserved. Each partition is a log
  • 17. ©2014 Cloudera, Inc. All rights reserved. Each Broker has many partitions Partition 0 Partition 0 Partition 1 Partition 1 Partition 2 Partition 1 Partition 0 Partition 2 Partion 2
  • 18. ©2014 Cloudera, Inc. All rights reserved. Producers load balance between partitions Partition 0 Partition 1 Partition 2 Partition 1 Partition 0 Partition 2 Partition 0 Partition 1 Partion 2 Client
  • 19. ©2014 Cloudera, Inc. All rights reserved. Producers load balance between partitions Partition 0 Partition 1 Partition 2 Partition 1 Partition 0 Partition 2 Partition 0 Partition 1 Partion 2 Client
  • 20. Consumers Consumer Group Y Consumer Group X Consumer Kafka Cluster Topic Partition A (File) Partition B (File) Partition C (File) Consumer Consumer Consumer Order retained with in partition Order retained with in partition but not over partitionsOffSetX OffSetX OffSetX OffSetYOffSetYOffSetY Off sets are kept per consumer group
  • 22. Keeping Things Simple • Consume records from Kafka Topic • Filter, transform, join, lookups, aggregate • Write to another Kafka Topic • https://github.com/confluentinc/examples/tree/master/specific-avro- consumer
  • 23. Kafka Makes Streams Easy • Producers partition the data • Consumers load balance partitions • Add / remove consumers any way you want • Will work with any framework (or none!)
  • 24. Coming Soon to Kafka Near You • KafkaConnect - Export / Import for Kafka - 0.9.0 (Its here!) • KStream • Consumer-Producer client - Processor (0.10.0 - April?) • DSLs: • KStream (a bit like Spark) - (0.10.0 - April?) • SQL - ???
  • 25. KConnect - Its a thing • Easy to add connectors to Kafka • Existing connectors • JDBC • HDFS • MySQL * 2 • ElasticSearch * 4 • Cassandra • S3 * 2 • MQTT • Twitter
  • 26.
  • 27. • Kafka Connectors: • http://www.confluent.io/developers/connectors • http://docs.confluent.io/2.0.0/connect/index.html • KStreams: • https://github.com/gwenshap/kafka- examples/blob/master/KafkaStreamsAvg
  • 29. ©2014 Cloudera, Inc. All rights reserved. Spark Example 1. val conf = new SparkConf().setMaster("local[2]”) 2. val sc = new SparkContext(conf) 3. val lines = sc.textFile(path, 2) 4. val words = lines.flatMap(_.split(" ")) 5. val pairs = words.map(word => (word, 1)) 6. val wordCounts = pairs.reduceByKey(_ + _) 7. wordCounts.print()
  • 30. ©2014 Cloudera, Inc. All rights reserved. Spark Streaming Example 1. val conf = new SparkConf().setMaster("local[2]”) 2. val ssc = new StreamingContext(conf, Seconds(1)) 3. val lines = ssc.socketTextStream("localhost", 9999) 4. val words = lines.flatMap(_.split(" ")) 5. val pairs = words.map(word => (word, 1)) 6. val wordCounts = pairs.reduceByKey(_ + _) 7. wordCounts.print() 8. SSC.start()
  • 31. Spark Streaming Confidentiality Information Goes Here DStream DStream DStream Single Pass Source Receiver RDD Source Receiver RDD RDD Filter Count Print Source Receiver RDD RDD RDD Single Pass Filter Count Print Pre-first Batch First Batch Second Batch
  • 32. Confidentiality Information Goes Here DStream DStream DStream Single Pass Source Receiver RDD Source Receiver RDD RDD Filter Count Print Source Receiver RDD RDD RDD Single Pass Filter Count Pre-first Batch First Batch Second Batch Stateful RDD 1 Print Stateful RDD 2 Stateful RDD 1
  • 33. ©2014 Cloudera, Inc. All rights reserved. High Level Architecture
  • 34. ©2014 Cloudera, Inc. All rights reserved. Real-Time Event Processing Approach Hadoop Cluster II Storage Processing SolR Hadoop Cluster I ClientClient Flume Agents Hbase / Memory Spark Streaming HDFS Hive/Impa la Map/Redu ce Spark Search Automated & Manual Analytical Adjustments and Pattern detection Fetching & Updating Profiles HDFSEventSink SolR Sink Batch Time Adjustments Automated & Manual Review of NRT Changes and Counters Local Cache Kafka Clients: (Swipe here!) Web App Adjust NRT Statistics
  • 35. Yarn / Mesos Analytics Layer SolR Client Client KStreams Analytical Adjustments and Pattern detection Fetching & Updating Profiles Adjusting NRT Stats Batch Time Adjustments Review of NRT Changes and Counters Local Cache Kafka Clients: (Swipe here!) Web App Kafka HDFS NoSQL DWH Connecor Connector
  • 37. ©2014 Cloudera, Inc. All rights reserved. NRT Processing
  • 38. ©2014 Cloudera, Inc. All rights reserved. Focus on NRT First Hadoop Cluster II Storage Processing SolR Hadoop Cluster I ClientClient Processor Hbase / Memory Spark Streaming HDFS Hive/Impa la Map/Redu ce Spark Search Automated & Manual Analytical Adjustments and Pattern detection Fetching & Updating Profiles HDFSEventSink SolR Sink Batch Time Adjustments Automated & Manual Review of NRT Changes and Counters Local Cache Kafka Clients: (Swipe here!) Web App Adjust NRT Statistics
  • 39. ©2014 Cloudera, Inc. All rights reserved. Streaming Architecture – NRT Event Processing Kafka Initial Events Topic Event Processing Logic Local Memory HBase Client Kafka Answer Topic HBase KafkaConsumer KafkaProducer Able to respond with in 10s of milliseconds
  • 40. ©2014 Cloudera, Inc. All rights reserved. Partitioned NRT Event Processing Kafka Initial Events Topic Event Processing Logic Local Cache HBase Client Kafka Answer Topic HBase KafkaConsumer KafkaProducer Topic Partition A Partition B Partition C Producer Partitioner Producer Partitioner Producer Partitioner Custom Partitioner Better use of local memory
  • 41. ©2014 Cloudera, Inc. All rights reserved. Questions? http://confluent.io @confluentInc @gwenshap gwen@confluent.io

Editor's Notes

  1. This gives me a lot of perspective regarding the use of Hadoop
  2. Topics are partitioned, each partition ordered and immutable. Messages in a partition have an ID, called Offset. Offset uniquely identifies a message within a partition
  3. Kafka retains all messages for fixed amount of time. Not waiting for acks from consumers. The only metadata retained per consumer is the position in the log – the offset So adding many consumers is cheap On the other hand, consumers have more responsibility and are more challenging to implement correctly And “batching” consumers is not a problem
  4. 3 partitions, each replicated 3 times.
  5. The choose how many replicas must ACK a message before its considered committed. This is the tradeoff between speed and reliability
  6. The choose how many replicas must ACK a message before its considered committed. This is the tradeoff between speed and reliability
  7. can read from one or more partition leader. You can’t have two consumers in same group reading the same partition. Leaders obviously do more work – but they are balanced between nodes We reviewed the basic components on the system, and it may seem complex. In the next section we’ll see how simple it actually is to get started with Kafka.