SlideShare a Scribd company logo
Lenny Evans, Data Scientist
Richard Ash, Software Engineer
Karthik Ramasamy, Senior Software Engineer
uScan
Uber in Numbers
4 Billion Trips in 2017
15 Million Uber trips per day
75 Million monthly active riders
600+ cities across 78 countries
Payment Fraud Account
Takeovers
GPS Spoofing
● Give users better experience when adding credit cards
○ Potential for easier signup experience
● Prevent payment fraud
○ Fraudsters rarely have physical card
○ Ask users to produce a card with correct number
○ Soft challenge is a better FP experience
Scanning Credit Cards
Demo
Why is this hard?
● Card numbers in different
places
● Many numbers (e.g. phone
numbers) on card
● Cards use different (even non-
monospace!) fonts
● Want robustness to rotations
and tilt for better UX
Design Choices: Mobile vs Server Side Inferencing
Mobile inferencing
+ Better UX
+ Better privacy
+ Less use of mobile bandwidth
- Less compute resources
- Less control over model deployment/
deployment frameworks still in infancy
- Increase in app size
Why deep learning?
● Hand-engineered features not robust enough for our needs
● Recent advancements allow concurrent detection and classification
● Easy to include detection of other features (like logos!)
5424 1811 1111 1458
Object
detection
model
Post
Processing
How does uScan SSD model detect and classify text?
(Actual input boxes are more
dense)
Model adjusts input boxes to
detect numbers and classify
Conv.
Layer
s
Mobilenet
1x1 Convolution
BB offsets, BB class confidences
Input Image
Non-Maximum Suppression
Feature
Extractor
BB Predictor
Dedup
e
Simplified Single Shot Detection (SSD) Model
Architecture is similar to typical
convnets, except model
outputs many outputs
Only need one feature
extractor as text is all similar in
size
Mobilenet
1x1 Convolution
BB offsets, BB class confidences
Input Image
Non-Maximum Suppression
Why SSD?
SSD is fully convolutional
Mobilenet allows for efficient
inference
Training Data
● Collecting cards, taking pictures, and labeling is an expensive,
time-consuming process
● Synthetically generated images
○ Nearly infinite dataset
○ Challenge is making synthetic images look like real cards and
match real-world conditions
Training Data
Generate ~1M synthetic images
Training Process
● AWS P3 instances with V100 GPUs
● Training (2 epochs) takes ~1 day
with 1 GPU
● Always initialize weights with
previous best model
Validation
Manually label single images
of credit cards
Compare precision/recall of
models on real cards
Post-processing
● How do we get card number from a list of bounding boxes?
○ Clustering! (DBScan)
○ Perform checks (e.g. checksum) on each cluster to
determine if credit card number
● Aggregate multiple frames for even better coverage
9 4
Deploying on Mobile
Which Framework?
● CoreML
○ + Performance
○ - iOS 11 only
● TensorFlow
○ + Flexibility
○ - Binary Size/Performance
Optimizing for Mobile
Three metrics:
● TensorFlow library size (MB)
● uScan model size (MB)
● Performance (inferences/second)
Optimizing for Mobile
Start:
● TensorFlow Library Size — 25 MB
● uScan model Size — 25 MB
● Performance — 0.5-1.0 inferences/second
Optimizing for Mobile
Current:
● TensorFlow Library Size — 1.14 MB
● uScan model Size — 1 MB
● Performance — 5 inferences/second
uScan reads >97% of cards!
Optimizing for Mobile
How?
● TensorFlow Library Size — 25 -> 1.14 MB
○ Selective Registration
● uScan model Size — 25 -> 1 MB
○ Quantization, model optimizations
● Performance — 1 -> 5 inferences/second
○ Threading, memory & model optimizations
Cool way to get 20% more inferences per second on
iPhone!
Optimizing for Mobile
Next:
● TensorFlow Library Size — <300 KB
○ TF Lite, CoreML
● uScan model Size — 0.5-2 MB
○ MobileNetV2, detect more objects
● Performance — 10-15 inferences/second
○ MobileNetV2, TF Lite
The Future
● Object detection is not limited to text, we can look for card features
● Performance optimizations
○ GPUs for inference (TF-Lite + CoreML)
○ Use MobileNet v2
● Android
○ Model optimizations, deployment framework
● Other applications of on-device OCR at Uber
Thanks!

More Related Content

What's hot

A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESA STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
ijcsitcejournal
 
Introduction to Fog Computing
Introduction to Fog ComputingIntroduction to Fog Computing
Introduction to Fog Computing
Er. Ajay Sirsat
 
Ride-sharing platforms.pptx
Ride-sharing platforms.pptxRide-sharing platforms.pptx
Ride-sharing platforms.pptx
University of Dhaka, Bangladesh
 
Spam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes AlgorithmSpam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes Algorithm
Akshay Pal
 
PPT steganography
PPT steganographyPPT steganography
PPT steganography
parvez Sharaf
 
Face recognition technology - BEST PPT
Face recognition technology - BEST PPTFace recognition technology - BEST PPT
Face recognition technology - BEST PPT
Siddharth Modi
 
Detecting the presence of cyberbullying using computer software
Detecting the presence of cyberbullying using computer softwareDetecting the presence of cyberbullying using computer software
Detecting the presence of cyberbullying using computer software
Ashish Arora
 
Iris ppt
Iris pptIris ppt
Iris ppt
Sri Harati K
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
Ashish Arora
 
Mobile payment
Mobile paymentMobile payment
Mobile payment
Savvycom Savvycom
 
Fingerprint Recognition Technique(PPT)
Fingerprint Recognition Technique(PPT)Fingerprint Recognition Technique(PPT)
Fingerprint Recognition Technique(PPT)
Sandeep Kumar Panda
 
car number plate detection using matlab image & video processing
car number plate detection using matlab image & video processingcar number plate detection using matlab image & video processing
car number plate detection using matlab image & video processing
Kesava Korukonda
 
Digital signature 2
Digital signature 2Digital signature 2
Digital signature 2
Ankita Dave
 
Pervasive computing
Pervasive computingPervasive computing
Pervasive computing
Preethi AKNR
 
Uber
UberUber
Iris recognition
Iris recognition Iris recognition
Iris recognition
Jayati Bhattacharyya
 
FINGERPRINT BASED ATM SYSTEM
FINGERPRINT BASED ATM SYSTEMFINGERPRINT BASED ATM SYSTEM
FINGERPRINT BASED ATM SYSTEM
Journal For Research
 
Cloud Infrastructure Mechanisms
Cloud Infrastructure MechanismsCloud Infrastructure Mechanisms
Cloud Infrastructure Mechanisms
Mohammed Sajjad Ali
 
Iot and cloud computing
Iot and cloud computingIot and cloud computing
Iot and cloud computing
eteshagarwal1
 
Final year ppt
Final year pptFinal year ppt
Final year ppt
Shruti Chandra
 

What's hot (20)

A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESA STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
 
Introduction to Fog Computing
Introduction to Fog ComputingIntroduction to Fog Computing
Introduction to Fog Computing
 
Ride-sharing platforms.pptx
Ride-sharing platforms.pptxRide-sharing platforms.pptx
Ride-sharing platforms.pptx
 
Spam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes AlgorithmSpam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes Algorithm
 
PPT steganography
PPT steganographyPPT steganography
PPT steganography
 
Face recognition technology - BEST PPT
Face recognition technology - BEST PPTFace recognition technology - BEST PPT
Face recognition technology - BEST PPT
 
Detecting the presence of cyberbullying using computer software
Detecting the presence of cyberbullying using computer softwareDetecting the presence of cyberbullying using computer software
Detecting the presence of cyberbullying using computer software
 
Iris ppt
Iris pptIris ppt
Iris ppt
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
 
Mobile payment
Mobile paymentMobile payment
Mobile payment
 
Fingerprint Recognition Technique(PPT)
Fingerprint Recognition Technique(PPT)Fingerprint Recognition Technique(PPT)
Fingerprint Recognition Technique(PPT)
 
car number plate detection using matlab image & video processing
car number plate detection using matlab image & video processingcar number plate detection using matlab image & video processing
car number plate detection using matlab image & video processing
 
Digital signature 2
Digital signature 2Digital signature 2
Digital signature 2
 
Pervasive computing
Pervasive computingPervasive computing
Pervasive computing
 
Uber
UberUber
Uber
 
Iris recognition
Iris recognition Iris recognition
Iris recognition
 
FINGERPRINT BASED ATM SYSTEM
FINGERPRINT BASED ATM SYSTEMFINGERPRINT BASED ATM SYSTEM
FINGERPRINT BASED ATM SYSTEM
 
Cloud Infrastructure Mechanisms
Cloud Infrastructure MechanismsCloud Infrastructure Mechanisms
Cloud Infrastructure Mechanisms
 
Iot and cloud computing
Iot and cloud computingIot and cloud computing
Iot and cloud computing
 
Final year ppt
Final year pptFinal year ppt
Final year ppt
 

Similar to Validating credit cards on mobile using deep learning

Build A Scalable Mobile App
Build A Scalable Mobile App Build A Scalable Mobile App
Build A Scalable Mobile App
Mohamed Aboul-Fotouh
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
NETWAYS
 
SaaS startups - Software Engineering Challenges
SaaS startups - Software Engineering ChallengesSaaS startups - Software Engineering Challenges
SaaS startups - Software Engineering Challenges
Malinda Kapuruge
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
Rob Winters
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
Dani Solà Lagares
 
[TTT Meetup] Enhance mobile app testing with performance-centric strategies (...
[TTT Meetup] Enhance mobile app testing with performance-centric strategies (...[TTT Meetup] Enhance mobile app testing with performance-centric strategies (...
[TTT Meetup] Enhance mobile app testing with performance-centric strategies (...
NITHIN S.S
 
Using FME to Automate Data Integration in a City
Using FME to Automate Data Integration in a CityUsing FME to Automate Data Integration in a City
Using FME to Automate Data Integration in a City
Safe Software
 
Online voting system presentation slide (1)
Online voting system presentation slide (1)Online voting system presentation slide (1)
Online voting system presentation slide (1)
wasi0013
 
Improving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time SparkImproving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time Spark
datamantra
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
Ed Hunter
 
Big Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextBig Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile Context
InMobi Technology
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS Services
Anjani Phuyal
 
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2
 
Michelangelo - Machine Learning Platform - 2018
Michelangelo - Machine Learning Platform - 2018Michelangelo - Machine Learning Platform - 2018
Michelangelo - Machine Learning Platform - 2018
Karthik Murugesan
 
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Costanoa Ventures
 
Introduction to Backend Development (1).pptx
Introduction to Backend Development (1).pptxIntroduction to Backend Development (1).pptx
Introduction to Backend Development (1).pptx
OsuGodbless
 
Grab at Scale with Scylla
Grab at Scale with ScyllaGrab at Scale with Scylla
Grab at Scale with Scylla
ScyllaDB
 
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
Amazon Web Services
 
Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...
Pôle Systematic Paris-Region
 
Modern Software Architectures - Overview
Modern Software Architectures - Overview Modern Software Architectures - Overview
Modern Software Architectures - Overview
CodeOps Technologies LLP
 

Similar to Validating credit cards on mobile using deep learning (20)

Build A Scalable Mobile App
Build A Scalable Mobile App Build A Scalable Mobile App
Build A Scalable Mobile App
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
SaaS startups - Software Engineering Challenges
SaaS startups - Software Engineering ChallengesSaaS startups - Software Engineering Challenges
SaaS startups - Software Engineering Challenges
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
[TTT Meetup] Enhance mobile app testing with performance-centric strategies (...
[TTT Meetup] Enhance mobile app testing with performance-centric strategies (...[TTT Meetup] Enhance mobile app testing with performance-centric strategies (...
[TTT Meetup] Enhance mobile app testing with performance-centric strategies (...
 
Using FME to Automate Data Integration in a City
Using FME to Automate Data Integration in a CityUsing FME to Automate Data Integration in a City
Using FME to Automate Data Integration in a City
 
Online voting system presentation slide (1)
Online voting system presentation slide (1)Online voting system presentation slide (1)
Online voting system presentation slide (1)
 
Improving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time SparkImproving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time Spark
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Big Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextBig Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile Context
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS Services
 
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
 
Michelangelo - Machine Learning Platform - 2018
Michelangelo - Machine Learning Platform - 2018Michelangelo - Machine Learning Platform - 2018
Michelangelo - Machine Learning Platform - 2018
 
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
 
Introduction to Backend Development (1).pptx
Introduction to Backend Development (1).pptxIntroduction to Backend Development (1).pptx
Introduction to Backend Development (1).pptx
 
Grab at Scale with Scylla
Grab at Scale with ScyllaGrab at Scale with Scylla
Grab at Scale with Scylla
 
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
 
Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...
 
Modern Software Architectures - Overview
Modern Software Architectures - Overview Modern Software Architectures - Overview
Modern Software Architectures - Overview
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 

Recently uploaded (20)

GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 

Validating credit cards on mobile using deep learning

  • 1. Lenny Evans, Data Scientist Richard Ash, Software Engineer Karthik Ramasamy, Senior Software Engineer uScan
  • 2. Uber in Numbers 4 Billion Trips in 2017 15 Million Uber trips per day 75 Million monthly active riders 600+ cities across 78 countries
  • 4. ● Give users better experience when adding credit cards ○ Potential for easier signup experience ● Prevent payment fraud ○ Fraudsters rarely have physical card ○ Ask users to produce a card with correct number ○ Soft challenge is a better FP experience Scanning Credit Cards
  • 5.
  • 7. Why is this hard? ● Card numbers in different places ● Many numbers (e.g. phone numbers) on card ● Cards use different (even non- monospace!) fonts ● Want robustness to rotations and tilt for better UX
  • 8. Design Choices: Mobile vs Server Side Inferencing Mobile inferencing + Better UX + Better privacy + Less use of mobile bandwidth - Less compute resources - Less control over model deployment/ deployment frameworks still in infancy - Increase in app size
  • 9. Why deep learning? ● Hand-engineered features not robust enough for our needs ● Recent advancements allow concurrent detection and classification ● Easy to include detection of other features (like logos!)
  • 10. 5424 1811 1111 1458 Object detection model Post Processing
  • 11. How does uScan SSD model detect and classify text? (Actual input boxes are more dense) Model adjusts input boxes to detect numbers and classify Conv. Layer s
  • 12. Mobilenet 1x1 Convolution BB offsets, BB class confidences Input Image Non-Maximum Suppression Feature Extractor BB Predictor Dedup e Simplified Single Shot Detection (SSD) Model Architecture is similar to typical convnets, except model outputs many outputs Only need one feature extractor as text is all similar in size
  • 13. Mobilenet 1x1 Convolution BB offsets, BB class confidences Input Image Non-Maximum Suppression Why SSD? SSD is fully convolutional Mobilenet allows for efficient inference
  • 14. Training Data ● Collecting cards, taking pictures, and labeling is an expensive, time-consuming process ● Synthetically generated images ○ Nearly infinite dataset ○ Challenge is making synthetic images look like real cards and match real-world conditions
  • 15. Training Data Generate ~1M synthetic images
  • 16. Training Process ● AWS P3 instances with V100 GPUs ● Training (2 epochs) takes ~1 day with 1 GPU ● Always initialize weights with previous best model
  • 17. Validation Manually label single images of credit cards Compare precision/recall of models on real cards
  • 18. Post-processing ● How do we get card number from a list of bounding boxes? ○ Clustering! (DBScan) ○ Perform checks (e.g. checksum) on each cluster to determine if credit card number ● Aggregate multiple frames for even better coverage 9 4
  • 19. Deploying on Mobile Which Framework? ● CoreML ○ + Performance ○ - iOS 11 only ● TensorFlow ○ + Flexibility ○ - Binary Size/Performance
  • 20. Optimizing for Mobile Three metrics: ● TensorFlow library size (MB) ● uScan model size (MB) ● Performance (inferences/second)
  • 21. Optimizing for Mobile Start: ● TensorFlow Library Size — 25 MB ● uScan model Size — 25 MB ● Performance — 0.5-1.0 inferences/second
  • 22. Optimizing for Mobile Current: ● TensorFlow Library Size — 1.14 MB ● uScan model Size — 1 MB ● Performance — 5 inferences/second uScan reads >97% of cards!
  • 23. Optimizing for Mobile How? ● TensorFlow Library Size — 25 -> 1.14 MB ○ Selective Registration ● uScan model Size — 25 -> 1 MB ○ Quantization, model optimizations ● Performance — 1 -> 5 inferences/second ○ Threading, memory & model optimizations
  • 24. Cool way to get 20% more inferences per second on iPhone!
  • 25.
  • 26. Optimizing for Mobile Next: ● TensorFlow Library Size — <300 KB ○ TF Lite, CoreML ● uScan model Size — 0.5-2 MB ○ MobileNetV2, detect more objects ● Performance — 10-15 inferences/second ○ MobileNetV2, TF Lite
  • 27. The Future ● Object detection is not limited to text, we can look for card features ● Performance optimizations ○ GPUs for inference (TF-Lite + CoreML) ○ Use MobileNet v2 ● Android ○ Model optimizations, deployment framework ● Other applications of on-device OCR at Uber

Editor's Notes

  1. [SAY] [DO] [KNOW]
  2. TODO: Make a nicer picture
  3. TODO: Make a nicer picture
  4. TODO: Make a nicer picture
  5. Add slide on why we went with synthetic data
  6. Measure performance in inferences/second
  7. Change language from FPS
  8. Change language from FPS
  9. Change language from FPS
  10. Change language from FPS