SlideShare a Scribd company logo
1 of 57
© 2016 Feedzai Confidential 1
@antonioalegria
Product Lead for Cloud
Machine Learning:
Building Successful Products at Scale
How to stop a robbery?
Arm the good guys
How to build the machine?
© Feedzai Inc. Confidential.
This talk is about
Tips on building a successful Machine Learning product
One that works on multiple use cases within the same ecosystem
One that works on structured data and in classification use cases
Challenges with doing this for a generic SaaS Fraud Prevention
product
Data API design choices
Taking advantage of the data to power Machine Learning
5
© Feedzai Inc. Confidential.
This talk is NOT about
Unstructured data problems
Natural Language Processing
Image Recognition
Speech Recognition
Virtual Assistants
Self-driving cars
Specific technologies
Dataviz or UI
6
© Feedzai Inc. Confidential.
Machine Learning Products
© Feedzai Inc. Confidential.
Examples of ML Products
Gmail Spam detection
Recommendations:
Movies
Books
Music
Dating
Automating Access Control @ Amazon
Predicting Heart Attacks / Diseases on a Fitness Tracking Service
Fraud Detection ;-)
© Feedzai Inc. Confidential. 9
What makes them good?
© Feedzai Inc. Confidential. 10
They do their job very well
© Feedzai Inc. Confidential. 11
They blend in the environment
© Feedzai Inc. Confidential. 12
They self-evolve
© Feedzai Inc. Confidential. 13
They seem magical
© Feedzai Inc. Confidential.
Fraud Detection
© Feedzai Inc. Confidential.
Feedzai in a Nutshell
What? Detect fraudulent payments and their customers, in real-time
How?
We receive transaction and behavior data
We continuously update 1:1 profiles for every entity (e.g. cards, IPs, merchants, etc)
Machine Learning model analyses each payment and its history in real-time
User receives scores immediately, with human-readable explanations
User can give feedback by labeling transactions as “ok” or “fraud” – our models will learn automatically
Where? Deployed on-site or used in the cloud through REST API
When? our AI never sleeps and it responds in a few milliseconds
© Feedzai Inc. Confidential.
Huge Data
Securing over $2B per day (over $700B/year, 3.3x Portugal’s GDP)
Growing soon to reach trillion scale
Fighting crime across the globe
US – our clients use Feedzai to process $4 of every $10 in all US
Canada
Brazil
India
Nigeria
Europe
We have to make decisions in 25ms
DATA-DRIVEN FRAUD DETECTION
1:1 Profiling & Analytics
✖
✖ ✖
✖
Payments & Actions
$ € ¥ £
Machine Learning
Data Enrichment
★★★★★
Risk Analysis
Decision:
Approve, Decline, Review
User Feedback
Human-built Rules
Request
Response
© 2016 Feedzai Confidential 18
White-box Scoring
Human explanations from AI reasoning
© Feedzai Inc. Confidential.
Challenges
SaaS for Online Commerce
Fraud Prevention in Online Commerce is a very broad scenario
Different geographies
Widely different use cases
Fraud and abuse are a case of extremely unbalanced classes
It can involve many abuse scenarios:
Payment fraud (e.g. stolen credit cards)
Account Takeovers
Money Laundering
Abusing employee benefits
Solution needs to SCALE
© Feedzai Inc. Confidential.
Scale…
I don’t think it means what you think it
means
© Feedzai Inc. Confidential. 21
Lets look at some of the key components of
a good ML product
© Feedzai Inc. Confidential.
The Data API
© 2016 Feedzai Confidential 23
Data API Specificity Spectrum
• Very specific to a particular use case
• Strict validations
• Clients need to fully adapt to the API
• Defines bare minimum generic terms
• Clients have full flexibility to integrate
• Custom events with custom data fields
• API defines common “language”
• Comprehensive set of well-defined optional
fields
• Supports custom fields and events
• Clients adapt to the Native fields but can
use custom data
+ Shared Models
+ Shared Model Features
+ Very easy to fully automate and scale
– Low adaptation to clients
+ Potential for total adaptation to clients
– Costly adaptation to clients
– Hard to do feature engineering
– Fully Separate Models
– Fully Separate Model Features
API Flexibility
Model Shareability
+ Potential for high adaptation to clients
+ Tiered model possible (shared + specific)
+ Shared and specific model features
– Automation and scaling is not trivial
Generic ML Platform
(e.g. BigML)
Very use-case specific
(e.g. Email spam detection)
Platform for classes of use cases
(e.g. Feedzai for Online Commerce)
© Feedzai Inc. Confidential.
Responses should include the following
Score(s) (e.g. probability of being fraud)
Decision:
Accept
Review
Decline
Human-Readable Explanations
Machine-Readable Reason Codes
24
Feedzai API Example
POST /v1.1/payments
{
"id": "1477020120",
"user_id": "af00-bc14-1245",
"amount": 280000,
"currency": "USD",
"ip": "212.10.114.18",
"items": [
{
"item_id": "cell_400200",
"name": "Cellphone 1450",
"price": 25000
}
],
"payment_methods": [
{
"type": "card”,
"card_fullname": "HUGH Howey",
"card_pan": “4539488752989912",
"card_exp": "06/17”
}
],
"user_defined": {
"is_po_box": true,
"expedited_delivery": true
}
}
HTTP 200 (OK)
{
"id": "1477020120”,
"score": 740,
"decision": ”review”,
"reason_codes": [
{ "name": "Fraud" },
{ "name": "MoneyLaundering" },
{ "name": "AccountTakeover" }
],
"explanation": [
{
"description": ”Customer used over 3 cards in past
week.",
"risk": 0.4,
"confidence": 5
},
{
"description": "Customer has used a single internet
address in the last 24 hours.",
"risk": 0.003,
"confidence": 5
}
]
}
Request Response
© Feedzai Inc. Confidential.
The Machine Learning Algorithm
(in 30 seconds)
© Feedzai Inc. Confidential.
Machine Learning Algorithm
Encapsulate the actual ML into an isolated component
Start with algorithms that are fast to train and evaluate, adapt to
different use cases, support classification and regression and are
whitebox
Random Forest
Gradient Boosting Machines (GBMs)
Deep Learning shows potential for more unstructured problems
Though it’s heavyweight to train, requires a lot of pre-processing
Still unclear how much it can “replace” feature engineering
27
© Feedzai Inc. Confidential. 28
Machine Learning is 90% Data Processing
© Feedzai Inc. Confidential.
Machine LearningIn Production
Live Input Data Instance Vector
Enrich
Filter
Transform
Aggregate
Project
Historical Input Data
Naïve ML Pipeline
Instance Vector + Class Annotation
Training
Classify
Historical
Data
Enrich
Filter
Transform
Aggregate
Project
© Feedzai Inc. Confidential.
Example Input Data
Transaction:
Amount
Currency
User ID
User Name
Credit Card Number
Cardholder Name
IP Address
30
© Feedzai Inc. Confidential.
Example Naïve Features
Amount in USD
Currency
Time of Day
Day of Week
Is IP a Proxy
IP Country == Store Country
IP Country
User ID
Card
Device ID
Some of these features are aweful (don’t do this):
Having such high-cardinality categoricals is bad and leads to overfitting
Also, the model isn’t learning patterns just which users/devices/cards are bad
31
© Feedzai Inc. Confidential. 32
The model won’t be very good
© Feedzai Inc. Confidential. 33
It can’t distinguish between two equal
transactions from two different people.
We need to go further
© Feedzai Inc. Confidential. 34
Goal: the model must see
The current event
+
All* past events
+
All* related events (e.g. same card)
© Feedzai Inc. Confidential. 35
A good approximation to this are 1:1 Profiles
© Feedzai Inc. Confidential.
What’s a profile?
An aggregation or summarization of events over a certain time window
and for a group of entities
Examples:
Number of transactions in last 24h for this card
Number of transactions in past month for this customer with this card
Number distinct cards for this customer
Last 5 used card countries
1:1 refers to the fact that profiles are tracked by specific entities
36
© Feedzai Inc. Confidential.
Characteristics of a Profile
It’s applied over a (usually time) data window
Sliding
Tumbling
Delayed
It has a set of dimensions or entities to group by
It has an aggregation function
37
© Feedzai Inc. Confidential.
Challenges
How do you calculate these profiles continuously and in real-time?
How do you calculate profiles for both short term and long-term
windows?
How do you reproduce exactly the same processing in training, testing
and in production?
How do you make it so that Data Scientists can ship something to
production without having Developers’ intervention?
How do you easily “code-review” it?
38
© Feedzai Inc. Confidential. 39
Reproducibility between Training and
Production is essential
This is the most important thing
© Feedzai Inc. Confidential.
Reproducibility
Without a training pipeline that mirrors real-time the model will
learn something different than what it will see in reality
This kind of concept drift can kill your model’s performance
You can fix this in two ways:
Have a very strict (and slow) process of testing and QA
Or you use the same code during training, testing and in production
40
© Feedzai Inc. Confidential. 41
Data Scientists must be able to “code”, test
and ship Feature Engineering logic to
Production
(without having Sw. Eng. having to
implement it based on a spec)
© Feedzai Inc. Confidential. 42
Complex Event Processing
+
Large Scale Data Processing Platforms
© Feedzai Inc. Confidential.
Complex Event Processing
Data Processing methodology and family of stream-based
technologies
Relies on DSL, sometimes similar to SQL
Instead of applying queries/logic to data, the data goes
through in-memory queries that update state immediately
43
© Feedzai Inc. Confidential.
Example
SELECT user_id,
card,
avg(amount) AS avg_amount,
count() AS num_trx,
count() / last().timestamp - first().timestamp AS velocity
FROM transactions[24 hours]
GROUP BY user_id, card;
© Feedzai Inc. Confidential.
Common CEP Operations
Filtering
Correlation
Windowing
Transformation
Aggregation/Grouping
Merging/Union
Sorting
Pattern Detection
45
© Feedzai Inc. Confidential.
Complex Event Processing at Scale
CEP technology is usually reliant on in-memory processing
To handle long-term profiles you need to pair this with
distributed data processing platforms
The ability to replay historical data like in production should
be a core requirement for the whole system
46
© Feedzai Inc. Confidential.
Other Tips
© Feedzai Inc. Confidential. 48
Support 0-downtime deployment of new
models in staging mode
© Feedzai Inc. Confidential. 49
Good (consistent) Data > Lots of Data
© Feedzai Inc. Confidential. 50
Do things that don’t scale
• Look at specific data rows
• Open the CSVs
• Use SQL to try to find new insights
© Feedzai Inc. Confidential. 51
Throw away good data
(wait, what?)
© Feedzai Inc. Confidential.
>99.5% Good Transactions
< 0.5% Fraud
Fraud is extremely unbalanced
Use undersampling to drop good transactions
© Feedzai Inc. Confidential.
KeyTakeaways
Design APIs with comprehensive native fields but allow custom
data
Data Processing is 90% of Machine Learning
Must Have: full reproducibility of production behavior offline
and for training
Combine CEP and streaming with distributed batch processing
Combine Machine Learning with Human Intelligence
© 2016 Feedzai Confidential53
54
MACHINE LEARNING
Keep commerce safe
and create a better customer
experience through
machine learning.
INVESTORS
QUICKFACTS
MISSION
WHAT OTHERS SAY
The U.S. market
fraud prevention just
got a new player.
Feedzai’s
machine learning
is the next wave.
Ranked as a cool
technology to
watch.
Startups that are
owning the data
game.
Payment Card
Management: Essential
tools for U.S. card issuers
• Top 50 High Growth startups in Europe,
FASTEST GROWING startup in Portugal
• Founded by data scientists and aerospace
engineers in 2009
• 120+ employees and doubling
• Offices in Portugal, Silicon Valley, New York
City, London
• Series B funded by Oak HC/FT and Sapphire
Ventures (SAP)
© 2016 Feedzai Confidential 55
Want to be a data Samurai?
We’re hiring! feedzai.com/about-us/careers
© 2016 Feedzai Confidential 56
Want to be a data Samurai?
We’re hiring! feedzai.com/about-us/careers
REFERENCES
• Automation like Iron Man, not Ultron and the Leftover Principle:
• http://queue.acm.org/detail.cfm?id=2841313
• Six novel ML applications:
• http://www.forbes.com/sites/85broads/2014/01/06/six-novel-machine-learning-
applications/#360c101567bf
• Complex Event Processing with Esper:
• http://www.slideshare.net/antonio_alegria/complex-event-processing-with-
esper-10122384
• Approaching almost any ML problem
• http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-
problem-abhishek-thakur/
• https://www.datarobot.com/
• XGBoost tutorial – http://xgboost.readthedocs.io/en/latest/model.html
© 2016 Feedzai Confidential 57

More Related Content

What's hot

Global Corporate Venturing Corporate Venture Investment in Artificial Intelli...
Global Corporate Venturing Corporate Venture Investment in Artificial Intelli...Global Corporate Venturing Corporate Venture Investment in Artificial Intelli...
Global Corporate Venturing Corporate Venture Investment in Artificial Intelli...
Jessica Straus
 
Falcon debit credit_2909_ps
Falcon debit credit_2909_psFalcon debit credit_2909_ps
Falcon debit credit_2909_ps
kazemita
 
Indian Startup Funding Report Q1 2013 [NextBigWhat]
Indian Startup Funding Report Q1 2013 [NextBigWhat]Indian Startup Funding Report Q1 2013 [NextBigWhat]
Indian Startup Funding Report Q1 2013 [NextBigWhat]
NextBigWhat
 

What's hot (19)

Intelligent Autonomous Transportation: IBM HorizonWatch 2016 Trend Brief
Intelligent Autonomous Transportation:  IBM HorizonWatch 2016 Trend Brief Intelligent Autonomous Transportation:  IBM HorizonWatch 2016 Trend Brief
Intelligent Autonomous Transportation: IBM HorizonWatch 2016 Trend Brief
 
Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017
 
Embedded Finance intro presentation - Simon Torrance August 2021
Embedded Finance intro presentation  - Simon Torrance August 2021Embedded Finance intro presentation  - Simon Torrance August 2021
Embedded Finance intro presentation - Simon Torrance August 2021
 
How should startups embrace the trend of IoT and Big Data
How should startups embrace the trend of IoT and Big DataHow should startups embrace the trend of IoT and Big Data
How should startups embrace the trend of IoT and Big Data
 
Global Corporate Venturing Corporate Venture Investment in Artificial Intelli...
Global Corporate Venturing Corporate Venture Investment in Artificial Intelli...Global Corporate Venturing Corporate Venture Investment in Artificial Intelli...
Global Corporate Venturing Corporate Venture Investment in Artificial Intelli...
 
Digital Marketing and Personalization of CX: 2016 Horizonwatch Trend Brief
Digital Marketing and Personalization of CX:  2016 Horizonwatch Trend BriefDigital Marketing and Personalization of CX:  2016 Horizonwatch Trend Brief
Digital Marketing and Personalization of CX: 2016 Horizonwatch Trend Brief
 
Fintech 100
Fintech 100Fintech 100
Fintech 100
 
3 Reasons why IBDs need a Fintech strategy
3 Reasons why IBDs need a Fintech strategy3 Reasons why IBDs need a Fintech strategy
3 Reasons why IBDs need a Fintech strategy
 
Start up nation central fintech industry report 2018
Start up nation central fintech industry report 2018Start up nation central fintech industry report 2018
Start up nation central fintech industry report 2018
 
Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017
 
The Future of Fintech in Southeast Asia
The Future of Fintech in Southeast AsiaThe Future of Fintech in Southeast Asia
The Future of Fintech in Southeast Asia
 
Accel Connect 2016 talk for portfolio founders & CEOs
Accel Connect 2016 talk for portfolio founders & CEOsAccel Connect 2016 talk for portfolio founders & CEOs
Accel Connect 2016 talk for portfolio founders & CEOs
 
Falcon debit credit_2909_ps
Falcon debit credit_2909_psFalcon debit credit_2909_ps
Falcon debit credit_2909_ps
 
Security Market Update
Security Market UpdateSecurity Market Update
Security Market Update
 
Indian Startup Funding Report Q1 2013 [NextBigWhat]
Indian Startup Funding Report Q1 2013 [NextBigWhat]Indian Startup Funding Report Q1 2013 [NextBigWhat]
Indian Startup Funding Report Q1 2013 [NextBigWhat]
 
Tracxn Big Data Analytics Landscape Report, June 2016
Tracxn Big Data Analytics Landscape Report, June 2016Tracxn Big Data Analytics Landscape Report, June 2016
Tracxn Big Data Analytics Landscape Report, June 2016
 
Falcon 012009
Falcon 012009Falcon 012009
Falcon 012009
 
Blockchain Tech Q1 2018 Startup Highlights
Blockchain Tech Q1 2018 Startup HighlightsBlockchain Tech Q1 2018 Startup Highlights
Blockchain Tech Q1 2018 Startup Highlights
 
Thomvest Ventures Real Estate Tech Review, Fall 2019
Thomvest Ventures Real Estate Tech Review, Fall 2019Thomvest Ventures Real Estate Tech Review, Fall 2019
Thomvest Ventures Real Estate Tech Review, Fall 2019
 

Viewers also liked

OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
Junli Gu
 
Autonomous driving revolution- trends, challenges and machine learning 
Autonomous driving revolution- trends, challenges and machine learning  Autonomous driving revolution- trends, challenges and machine learning 
Autonomous driving revolution- trends, challenges and machine learning 
Junli Gu
 

Viewers also liked (17)

Machine Learning Use Cases with Azure
Machine Learning Use Cases with AzureMachine Learning Use Cases with Azure
Machine Learning Use Cases with Azure
 
Machine Learning and its Use Cases (dsth Meetup#3)
Machine Learning and its Use Cases (dsth Meetup#3)Machine Learning and its Use Cases (dsth Meetup#3)
Machine Learning and its Use Cases (dsth Meetup#3)
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1
 
Machine Learning Using Cloud Services
Machine Learning Using Cloud ServicesMachine Learning Using Cloud Services
Machine Learning Using Cloud Services
 
2016 05-16 testing_distributed_systems_v1 1
2016 05-16 testing_distributed_systems_v1 12016 05-16 testing_distributed_systems_v1 1
2016 05-16 testing_distributed_systems_v1 1
 
Case studies in Games, Machine Learning in the Cloud,
Case studies in Games, Machine Learning in the Cloud,Case studies in Games, Machine Learning in the Cloud,
Case studies in Games, Machine Learning in the Cloud,
 
Machine Learning and the Cloud
Machine Learning and the CloudMachine Learning and the Cloud
Machine Learning and the Cloud
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
 
Autonomous driving revolution- trends, challenges and machine learning 
Autonomous driving revolution- trends, challenges and machine learning  Autonomous driving revolution- trends, challenges and machine learning 
Autonomous driving revolution- trends, challenges and machine learning 
 
Cloud and Machine Learning in real world business
Cloud and Machine Learning in real world businessCloud and Machine Learning in real world business
Cloud and Machine Learning in real world business
 
Machine Learning in the Cloud: Building a Better Forecast with H20 & Salesforce
Machine Learning in the Cloud: Building a Better Forecast with H20 & SalesforceMachine Learning in the Cloud: Building a Better Forecast with H20 & Salesforce
Machine Learning in the Cloud: Building a Better Forecast with H20 & Salesforce
 
Cloud Machine Learning with Google Cloud Platform
Cloud Machine Learning with Google Cloud PlatformCloud Machine Learning with Google Cloud Platform
Cloud Machine Learning with Google Cloud Platform
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
20 uses cases - Artificial Intelligence and Machine Learning in agriculture ...
20 uses cases - Artificial Intelligence and Machine Learning  in agriculture ...20 uses cases - Artificial Intelligence and Machine Learning  in agriculture ...
20 uses cases - Artificial Intelligence and Machine Learning in agriculture ...
 
Machine learning and TensorFlow
Machine learning and TensorFlowMachine learning and TensorFlow
Machine learning and TensorFlow
 
Robust Large-Scale Machine Learning in the Cloud
Robust Large-Scale Machine Learning in the CloudRobust Large-Scale Machine Learning in the Cloud
Robust Large-Scale Machine Learning in the Cloud
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 

Similar to Pixels.camp - Machine Learning: Building Successful Products at Scale

The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...
Shift Conference
 
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaArtificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Capgemini
 
人工智慧雲服務與金融服務應用
人工智慧雲服務與金融服務應用人工智慧雲服務與金融服務應用
人工智慧雲服務與金融服務應用
Amazon Web Services
 

Similar to Pixels.camp - Machine Learning: Building Successful Products at Scale (20)

The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...
 
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
 
IBM i & Data Science in the AI era.
IBM i & Data Science in the AI era.  IBM i & Data Science in the AI era.
IBM i & Data Science in the AI era.
 
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaArtificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
 
Three Dimensions of Data as a Service
Three Dimensions of Data as a ServiceThree Dimensions of Data as a Service
Three Dimensions of Data as a Service
 
Mining Intelligent Insights: AI/ML for Financial Services
Mining Intelligent Insights: AI/ML for Financial ServicesMining Intelligent Insights: AI/ML for Financial Services
Mining Intelligent Insights: AI/ML for Financial Services
 
Guiding Principles for the Low Code Revolution – Intuit QuickBase EMPOWER2015...
Guiding Principles for the Low Code Revolution – Intuit QuickBase EMPOWER2015...Guiding Principles for the Low Code Revolution – Intuit QuickBase EMPOWER2015...
Guiding Principles for the Low Code Revolution – Intuit QuickBase EMPOWER2015...
 
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
 
AI & AWS DeepComposer
AI & AWS DeepComposerAI & AWS DeepComposer
AI & AWS DeepComposer
 
DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼
 
IBM Z for the Digital Enterprise 2018 - Z Keynote
IBM Z for the Digital Enterprise 2018 - Z KeynoteIBM Z for the Digital Enterprise 2018 - Z Keynote
IBM Z for the Digital Enterprise 2018 - Z Keynote
 
Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing
 
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
 
How to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot ProjectHow to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot Project
 
Opportunities derived by AI
Opportunities derived by AIOpportunities derived by AI
Opportunities derived by AI
 
apidays LIVE Hong Kong - The Future of Legacy - How to leverage legacy and on...
apidays LIVE Hong Kong - The Future of Legacy - How to leverage legacy and on...apidays LIVE Hong Kong - The Future of Legacy - How to leverage legacy and on...
apidays LIVE Hong Kong - The Future of Legacy - How to leverage legacy and on...
 
2018 re:Invent - Safeguard the Integrity of Your Code for Fast and Secure Dep...
2018 re:Invent - Safeguard the Integrity of Your Code for Fast and Secure Dep...2018 re:Invent - Safeguard the Integrity of Your Code for Fast and Secure Dep...
2018 re:Invent - Safeguard the Integrity of Your Code for Fast and Secure Dep...
 
RoMT - Part 2 Marketing Technology Webinar
RoMT - Part 2 Marketing Technology WebinarRoMT - Part 2 Marketing Technology Webinar
RoMT - Part 2 Marketing Technology Webinar
 
Rhea corporate presentation v2
Rhea corporate presentation v2Rhea corporate presentation v2
Rhea corporate presentation v2
 
人工智慧雲服務與金融服務應用
人工智慧雲服務與金融服務應用人工智慧雲服務與金融服務應用
人工智慧雲服務與金融服務應用
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Pixels.camp - Machine Learning: Building Successful Products at Scale

  • 1. © 2016 Feedzai Confidential 1 @antonioalegria Product Lead for Cloud Machine Learning: Building Successful Products at Scale
  • 2. How to stop a robbery?
  • 4. How to build the machine?
  • 5. © Feedzai Inc. Confidential. This talk is about Tips on building a successful Machine Learning product One that works on multiple use cases within the same ecosystem One that works on structured data and in classification use cases Challenges with doing this for a generic SaaS Fraud Prevention product Data API design choices Taking advantage of the data to power Machine Learning 5
  • 6. © Feedzai Inc. Confidential. This talk is NOT about Unstructured data problems Natural Language Processing Image Recognition Speech Recognition Virtual Assistants Self-driving cars Specific technologies Dataviz or UI 6
  • 7. © Feedzai Inc. Confidential. Machine Learning Products
  • 8. © Feedzai Inc. Confidential. Examples of ML Products Gmail Spam detection Recommendations: Movies Books Music Dating Automating Access Control @ Amazon Predicting Heart Attacks / Diseases on a Fitness Tracking Service Fraud Detection ;-)
  • 9. © Feedzai Inc. Confidential. 9 What makes them good?
  • 10. © Feedzai Inc. Confidential. 10 They do their job very well
  • 11. © Feedzai Inc. Confidential. 11 They blend in the environment
  • 12. © Feedzai Inc. Confidential. 12 They self-evolve
  • 13. © Feedzai Inc. Confidential. 13 They seem magical
  • 14. © Feedzai Inc. Confidential. Fraud Detection
  • 15. © Feedzai Inc. Confidential. Feedzai in a Nutshell What? Detect fraudulent payments and their customers, in real-time How? We receive transaction and behavior data We continuously update 1:1 profiles for every entity (e.g. cards, IPs, merchants, etc) Machine Learning model analyses each payment and its history in real-time User receives scores immediately, with human-readable explanations User can give feedback by labeling transactions as “ok” or “fraud” – our models will learn automatically Where? Deployed on-site or used in the cloud through REST API When? our AI never sleeps and it responds in a few milliseconds
  • 16. © Feedzai Inc. Confidential. Huge Data Securing over $2B per day (over $700B/year, 3.3x Portugal’s GDP) Growing soon to reach trillion scale Fighting crime across the globe US – our clients use Feedzai to process $4 of every $10 in all US Canada Brazil India Nigeria Europe We have to make decisions in 25ms
  • 17. DATA-DRIVEN FRAUD DETECTION 1:1 Profiling & Analytics ✖ ✖ ✖ ✖ Payments & Actions $ € ¥ £ Machine Learning Data Enrichment ★★★★★ Risk Analysis Decision: Approve, Decline, Review User Feedback Human-built Rules Request Response
  • 18. © 2016 Feedzai Confidential 18 White-box Scoring Human explanations from AI reasoning
  • 19. © Feedzai Inc. Confidential. Challenges SaaS for Online Commerce Fraud Prevention in Online Commerce is a very broad scenario Different geographies Widely different use cases Fraud and abuse are a case of extremely unbalanced classes It can involve many abuse scenarios: Payment fraud (e.g. stolen credit cards) Account Takeovers Money Laundering Abusing employee benefits Solution needs to SCALE
  • 20. © Feedzai Inc. Confidential. Scale… I don’t think it means what you think it means
  • 21. © Feedzai Inc. Confidential. 21 Lets look at some of the key components of a good ML product
  • 22. © Feedzai Inc. Confidential. The Data API
  • 23. © 2016 Feedzai Confidential 23 Data API Specificity Spectrum • Very specific to a particular use case • Strict validations • Clients need to fully adapt to the API • Defines bare minimum generic terms • Clients have full flexibility to integrate • Custom events with custom data fields • API defines common “language” • Comprehensive set of well-defined optional fields • Supports custom fields and events • Clients adapt to the Native fields but can use custom data + Shared Models + Shared Model Features + Very easy to fully automate and scale – Low adaptation to clients + Potential for total adaptation to clients – Costly adaptation to clients – Hard to do feature engineering – Fully Separate Models – Fully Separate Model Features API Flexibility Model Shareability + Potential for high adaptation to clients + Tiered model possible (shared + specific) + Shared and specific model features – Automation and scaling is not trivial Generic ML Platform (e.g. BigML) Very use-case specific (e.g. Email spam detection) Platform for classes of use cases (e.g. Feedzai for Online Commerce)
  • 24. © Feedzai Inc. Confidential. Responses should include the following Score(s) (e.g. probability of being fraud) Decision: Accept Review Decline Human-Readable Explanations Machine-Readable Reason Codes 24
  • 25. Feedzai API Example POST /v1.1/payments { "id": "1477020120", "user_id": "af00-bc14-1245", "amount": 280000, "currency": "USD", "ip": "212.10.114.18", "items": [ { "item_id": "cell_400200", "name": "Cellphone 1450", "price": 25000 } ], "payment_methods": [ { "type": "card”, "card_fullname": "HUGH Howey", "card_pan": “4539488752989912", "card_exp": "06/17” } ], "user_defined": { "is_po_box": true, "expedited_delivery": true } } HTTP 200 (OK) { "id": "1477020120”, "score": 740, "decision": ”review”, "reason_codes": [ { "name": "Fraud" }, { "name": "MoneyLaundering" }, { "name": "AccountTakeover" } ], "explanation": [ { "description": ”Customer used over 3 cards in past week.", "risk": 0.4, "confidence": 5 }, { "description": "Customer has used a single internet address in the last 24 hours.", "risk": 0.003, "confidence": 5 } ] } Request Response
  • 26. © Feedzai Inc. Confidential. The Machine Learning Algorithm (in 30 seconds)
  • 27. © Feedzai Inc. Confidential. Machine Learning Algorithm Encapsulate the actual ML into an isolated component Start with algorithms that are fast to train and evaluate, adapt to different use cases, support classification and regression and are whitebox Random Forest Gradient Boosting Machines (GBMs) Deep Learning shows potential for more unstructured problems Though it’s heavyweight to train, requires a lot of pre-processing Still unclear how much it can “replace” feature engineering 27
  • 28. © Feedzai Inc. Confidential. 28 Machine Learning is 90% Data Processing
  • 29. © Feedzai Inc. Confidential. Machine LearningIn Production Live Input Data Instance Vector Enrich Filter Transform Aggregate Project Historical Input Data Naïve ML Pipeline Instance Vector + Class Annotation Training Classify Historical Data Enrich Filter Transform Aggregate Project
  • 30. © Feedzai Inc. Confidential. Example Input Data Transaction: Amount Currency User ID User Name Credit Card Number Cardholder Name IP Address 30
  • 31. © Feedzai Inc. Confidential. Example Naïve Features Amount in USD Currency Time of Day Day of Week Is IP a Proxy IP Country == Store Country IP Country User ID Card Device ID Some of these features are aweful (don’t do this): Having such high-cardinality categoricals is bad and leads to overfitting Also, the model isn’t learning patterns just which users/devices/cards are bad 31
  • 32. © Feedzai Inc. Confidential. 32 The model won’t be very good
  • 33. © Feedzai Inc. Confidential. 33 It can’t distinguish between two equal transactions from two different people. We need to go further
  • 34. © Feedzai Inc. Confidential. 34 Goal: the model must see The current event + All* past events + All* related events (e.g. same card)
  • 35. © Feedzai Inc. Confidential. 35 A good approximation to this are 1:1 Profiles
  • 36. © Feedzai Inc. Confidential. What’s a profile? An aggregation or summarization of events over a certain time window and for a group of entities Examples: Number of transactions in last 24h for this card Number of transactions in past month for this customer with this card Number distinct cards for this customer Last 5 used card countries 1:1 refers to the fact that profiles are tracked by specific entities 36
  • 37. © Feedzai Inc. Confidential. Characteristics of a Profile It’s applied over a (usually time) data window Sliding Tumbling Delayed It has a set of dimensions or entities to group by It has an aggregation function 37
  • 38. © Feedzai Inc. Confidential. Challenges How do you calculate these profiles continuously and in real-time? How do you calculate profiles for both short term and long-term windows? How do you reproduce exactly the same processing in training, testing and in production? How do you make it so that Data Scientists can ship something to production without having Developers’ intervention? How do you easily “code-review” it? 38
  • 39. © Feedzai Inc. Confidential. 39 Reproducibility between Training and Production is essential This is the most important thing
  • 40. © Feedzai Inc. Confidential. Reproducibility Without a training pipeline that mirrors real-time the model will learn something different than what it will see in reality This kind of concept drift can kill your model’s performance You can fix this in two ways: Have a very strict (and slow) process of testing and QA Or you use the same code during training, testing and in production 40
  • 41. © Feedzai Inc. Confidential. 41 Data Scientists must be able to “code”, test and ship Feature Engineering logic to Production (without having Sw. Eng. having to implement it based on a spec)
  • 42. © Feedzai Inc. Confidential. 42 Complex Event Processing + Large Scale Data Processing Platforms
  • 43. © Feedzai Inc. Confidential. Complex Event Processing Data Processing methodology and family of stream-based technologies Relies on DSL, sometimes similar to SQL Instead of applying queries/logic to data, the data goes through in-memory queries that update state immediately 43
  • 44. © Feedzai Inc. Confidential. Example SELECT user_id, card, avg(amount) AS avg_amount, count() AS num_trx, count() / last().timestamp - first().timestamp AS velocity FROM transactions[24 hours] GROUP BY user_id, card;
  • 45. © Feedzai Inc. Confidential. Common CEP Operations Filtering Correlation Windowing Transformation Aggregation/Grouping Merging/Union Sorting Pattern Detection 45
  • 46. © Feedzai Inc. Confidential. Complex Event Processing at Scale CEP technology is usually reliant on in-memory processing To handle long-term profiles you need to pair this with distributed data processing platforms The ability to replay historical data like in production should be a core requirement for the whole system 46
  • 47. © Feedzai Inc. Confidential. Other Tips
  • 48. © Feedzai Inc. Confidential. 48 Support 0-downtime deployment of new models in staging mode
  • 49. © Feedzai Inc. Confidential. 49 Good (consistent) Data > Lots of Data
  • 50. © Feedzai Inc. Confidential. 50 Do things that don’t scale • Look at specific data rows • Open the CSVs • Use SQL to try to find new insights
  • 51. © Feedzai Inc. Confidential. 51 Throw away good data (wait, what?)
  • 52. © Feedzai Inc. Confidential. >99.5% Good Transactions < 0.5% Fraud Fraud is extremely unbalanced Use undersampling to drop good transactions
  • 53. © Feedzai Inc. Confidential. KeyTakeaways Design APIs with comprehensive native fields but allow custom data Data Processing is 90% of Machine Learning Must Have: full reproducibility of production behavior offline and for training Combine CEP and streaming with distributed batch processing Combine Machine Learning with Human Intelligence © 2016 Feedzai Confidential53
  • 54. 54 MACHINE LEARNING Keep commerce safe and create a better customer experience through machine learning. INVESTORS QUICKFACTS MISSION WHAT OTHERS SAY The U.S. market fraud prevention just got a new player. Feedzai’s machine learning is the next wave. Ranked as a cool technology to watch. Startups that are owning the data game. Payment Card Management: Essential tools for U.S. card issuers • Top 50 High Growth startups in Europe, FASTEST GROWING startup in Portugal • Founded by data scientists and aerospace engineers in 2009 • 120+ employees and doubling • Offices in Portugal, Silicon Valley, New York City, London • Series B funded by Oak HC/FT and Sapphire Ventures (SAP)
  • 55. © 2016 Feedzai Confidential 55 Want to be a data Samurai? We’re hiring! feedzai.com/about-us/careers
  • 56. © 2016 Feedzai Confidential 56 Want to be a data Samurai? We’re hiring! feedzai.com/about-us/careers
  • 57. REFERENCES • Automation like Iron Man, not Ultron and the Leftover Principle: • http://queue.acm.org/detail.cfm?id=2841313 • Six novel ML applications: • http://www.forbes.com/sites/85broads/2014/01/06/six-novel-machine-learning- applications/#360c101567bf • Complex Event Processing with Esper: • http://www.slideshare.net/antonio_alegria/complex-event-processing-with- esper-10122384 • Approaching almost any ML problem • http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning- problem-abhishek-thakur/ • https://www.datarobot.com/ • XGBoost tutorial – http://xgboost.readthedocs.io/en/latest/model.html © 2016 Feedzai Confidential 57

Editor's Notes

  1. Feedzai’s mission is to keep commerce safe, to stop bad guys, criminals from stealing money: either through payment fraud, account takeover or other kinds of abuse 2 min