Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Low-Latency, Web-scale Fraud
Prevention with Samza and Friends
Edi Bice
ebice@ebay.com
Senior Data Scientist at eBay Enterprise leading
R&D efforts in applying machine learning to
fraud...
Commerce is getting more convenient, more complex, and so is fraud. To
keep up fraud prevention solutions need to process ...
Modern Fraud Prevention Architecture Requirements
• Web scale capable (horizontal scaling using commodity hardware)
• hand...
Preventing fraud is all about detecting abnormal behavior.
Normal behavior is not normal – we are all normal in our
own ab...
CName Date $ ShipAddr … CTxns CustAvgAmt TxAmt_AvgAmt_Ratio Shipping
Addr Txns
Shipping Addr
Avg Amount
Edi Bice 8/3/15 50...
job pipelines
Kafka, Samza, and the Unix philosophy of distributed data by Martin Kleppmann
Apache Kafka
• Distributed, scalable, publish-subscribe messaging system
• Persistent, high-throughput messaging
• Designe...
PreCog Samza Job Pipeline
Manifold (1-in-N-out) jobs
Risk-by-Y calc jobs
X-by-Y calc jobs
Assembly jobs
FAULT-TOLERANT
LOCAL STATE
Samza job partition 0 Samza job partition 1
Local
RocksDB
Local
RocksDB
Durable changelog Kafka...
Samza Jobs on Hadoop 2.0 (YARN)
Samza App Master
Node Manager
Kafka Broker
Machine 1 Machine 2
Samza TaskRunner: Partition...
Monitoring Samza: Metrics and More
Samza JMX metrics jmxtrans OpenTSDB/HBase Grafana
Questions?
http://www.ebayenterprise.com/
ebice@ebay.com
@edi_bice
https://www.linkedin.com/in/ebice
Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends
Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Transactional Streaming: If you can compute it, you can probably stream it.
Next
Upcoming SlideShare
Transactional Streaming: If you can compute it, you can probably stream it.
Next
Download to read offline and view in fullscreen.

Share

Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends

Download to read offline

To achieve better results in preventing fraud one needs to process a lot more data: older data by looking back a lot further in time, richer/wider data by integrating additional new data sources such as social, and connected data. Scaling such processing for web scale volume and latency requires novel technologies and alternative thinking.

Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends

  1. 1. Low-Latency, Web-scale Fraud Prevention with Samza and Friends
  2. 2. Edi Bice ebice@ebay.com Senior Data Scientist at eBay Enterprise leading R&D efforts in applying machine learning to fraud prevention and elsewhere.
  3. 3. Commerce is getting more convenient, more complex, and so is fraud. To keep up fraud prevention solutions need to process a lot more data • Older Data • Looking back a lot further in time • Older data is not effective excuse – home for the holidays? • Wider Data • Using all available data sources • How wide can customer name possibly be? • Richer Data • Social/unstructured data – people, places, interests • Connected data – who shipped to whom, where; email, devices, IP addresses • Faster Data • Clickstream data – website click patterns
  4. 4. Modern Fraud Prevention Architecture Requirements • Web scale capable (horizontal scaling using commodity hardware) • handle more actions and data for each user • handle more users and more volume from each user • handle more customers of all sizes (lowest processing cost) • Low latency (milliseconds not hours) • card present, digital goods, gift cards, store pickup (in-store online shopping!?) • e-commerce physical goods? – no teleporting yet so speed up what we can • process customer interactions in real time (personalization, royalty, shopping experience) • dynamic order process (identification, authentication, tender presentation) • Fault tolerance • Commodity hardware is not without faults • Expect and design for routine failures – more like shift changes, or relay races
  5. 5. Preventing fraud is all about detecting abnormal behavior. Normal behavior is not normal – we are all normal in our own abnormal special ways. • Typical customer profiling calculations • Transaction velocity (#txns_day) and change (#txns_day_1days/#txns_day_10days) • Amount velocity ($txns_day) and change ($txns_day_1days/$txns_day_10days) • Typical implementation and technologies 1. Define sliding window interval (7 days, a month, 6 months?) 2. For each live txn pull matching txns (card, ...) from single SQL DB within that sliding window 3. Loop over pulled transactions filtering based on timestamp to calculate change over sub-windows • Issues, Problems, Solutions?!
  6. 6. CName Date $ ShipAddr … CTxns CustAvgAmt TxAmt_AvgAmt_Ratio Shipping Addr Txns Shipping Addr Avg Amount Edi Bice 8/3/15 50 123 Main St 1 50 = (50 + 0) / 1 NA 1 50 Edi Bice 8/3/15 100 123 Main St 2 75 = (100 + 50*1) / 2 2.0 = 100 / 50 2 75 Edi Bice 8/4/15 150 123 Main St 3 100 = (150 + 75*2) / 3 2.0 = 150 / 75 3 100 Edi Bice 8/5/15 1500 999 Wall St 4 450 = (1500 + 100*3) / 4 15.0 = 1500 / 100 1 1500 Streaming Analytics New Avg Amt = (New Txn Amt + Curr Avg Amt * Curr Num Txns) / (Curr Num Txns + 1)
  7. 7. job pipelines Kafka, Samza, and the Unix philosophy of distributed data by Martin Kleppmann
  8. 8. Apache Kafka • Distributed, scalable, publish-subscribe messaging system • Persistent, high-throughput messaging • Designed for real time activity stream data processing
  9. 9. PreCog Samza Job Pipeline Manifold (1-in-N-out) jobs Risk-by-Y calc jobs X-by-Y calc jobs Assembly jobs
  10. 10. FAULT-TOLERANT LOCAL STATE Samza job partition 0 Samza job partition 1 Local RocksDB Local RocksDB Durable changelog Kafka replicate writes Embedded key-value: very fast Machine dies ⇒ local key-value store is lost Solution: replicate all writes to Kafka! Machine dies ⇒ restart on another machine Restore key-value store from changelog Changelog compaction in the background
  11. 11. Samza Jobs on Hadoop 2.0 (YARN) Samza App Master Node Manager Kafka Broker Machine 1 Machine 2 Samza TaskRunner: Partition 1 Node Manager Kafka Broker aStreamTask:process() Samza TaskRunner: Partition 2 aStreamTask:process() Machine 3 Node Manager Kafka Broker Samza TaskRunner: Partition 3 aStreamTask:process()
  12. 12. Monitoring Samza: Metrics and More Samza JMX metrics jmxtrans OpenTSDB/HBase Grafana
  13. 13. Questions? http://www.ebayenterprise.com/ ebice@ebay.com @edi_bice https://www.linkedin.com/in/ebice
  • BruceQi2

    May. 26, 2019
  • alirizas

    Feb. 13, 2019
  • ValeryBrasseur

    Mar. 10, 2016
  • kwangyongkim

    Mar. 10, 2016
  • SrinivasuluPunuru

    Mar. 9, 2016

To achieve better results in preventing fraud one needs to process a lot more data: older data by looking back a lot further in time, richer/wider data by integrating additional new data sources such as social, and connected data. Scaling such processing for web scale volume and latency requires novel technologies and alternative thinking.

Views

Total views

3,449

On Slideshare

0

From embeds

0

Number of embeds

111

Actions

Downloads

38

Shares

0

Comments

0

Likes

5

×