SlideShare a Scribd company logo
1 of 90
Download to read offline
Fighting Fraud
Finding Duplicates at Scale
Alexey Grigorev
2019/10/09
About me
https://www.slideshare.net/AlexeyGrigorev/avito-duplicate-ads-detection-kaggle
Me now (Oct 2018)
Disclaimer
Not a presentation of the duplicate detection system at OLX
Back to duplicates
User generated content
Such description. So much text
User generated content
Such description. So
much text
Such description. So
much text
Such description. So
much text
User generated content
User generated content
Problems:
● Illegal content
● NSFW content
● Duplicates
● Spam
● Fraud
FraudDuplicates
Two sides of the same coin
* I don’t necessarily think that dogecoin is fraud
Fraud and Duplicates
Such description
So much text
Fraud and Duplicates
Such description
So much text
Fraud and Duplicates
Such description
So much text
Fraud and Duplicates
Fraud and Duplicates
Such good description,
so better text
Fraud and Duplicates
Such goud description,
so better text
Fraud and Duplicates
Such goud description,
so better text
Fraud and Duplicates
Fraud and Duplicates
100$
deposit
Fraud and Duplicates
Fraud and Duplicates
Fraud and Duplicates
Fraud and Duplicates
Fraud and Duplicates
Fraud and Duplicates
Fraud and Duplicates
Fraud and Duplicates
Fraud and Duplicates
Fraud and Duplicates
How to prevent it?
Content moderation
ML
Such description
So much text
Accept
Reject
Moderation queue
MP
Automatic
moderation system
Moderation panel
Accept
Reject
Moderators
ML
Such description
So much text
Accept
Reject
Moderation queue
Automatic
moderation system
Duplicate
detection
Forbidden
items
Other ML
models
ML
Such description
So much text
Accept
Reject
Moderation queue
Automatic
moderation system
Duplicate
detection
Forbidden
items
Other ML
models
Duplicate detection
https://www.slideshare.net/AlexeyGrigorev/duplicates-everywhere-berlin
Duplicate detection framework
Candidate
Selection step
Candidate
Scoring step
find candidate duplicates (10-200) get real duplicates (0-50)
Step 1 Step 2
How:
● Domain knowledge (heuristics)
● Information retrieval techniques
● Approximate knn
Candidate
Selection step
Candidate
Scoring step
find candidate duplicates (10-200) get real duplicates (0-50)
Step 1 Step 2
Candidate selection
● Category
● City / district
● Seller id
● IP address of the seller
● Device signature
Candidate
Selection step
Candidate
Scoring step
find candidate duplicates (10-200) get real duplicates (0-50)
Step 1 Step 2
Machine Learning!
Duplicate Not duplicate
Moderation queue
MP
Moderation panel
Accept
Reject
Moderators
Machine Learning
ID1 ID2 Features Label
1 2 [0, 1, ..., 5] 1
1 4 [2, 0, ..., 3] 0
2 7 [3, 1, ..., 3] 0
k 5 [5, 3, ..., 8] 1
Feature
engineering
ID1 ID2 Label
1 2 1
1 4 0
2 7 0
k 5 1
Model
Tune F1/Precision/Recall
Features
Pairwise distances/similarities
● |price1 - price2|
● min(price1, price2) / max(price1, price2)
● dist(loc1, loc2)
● same(ip1, ip2)
● same(loc_id1, loc_id2)
● same(category1, category2)
● |len(title1) - len(title2)|
● |len(images1) - len(images2)|
Features
Pairwise distances/similarities
● Cosine between titles (TF-IDF)
● Cosine between description (TF-IDF)
● Word2Vec
Hashes
● md5: cryptographic hash
● dhash, phash, whash: Perceptive hashes
94088af86c038327
14ee7fe587860078a1109033318bd986
94088af86c038327
07aaedb9b75e88a6051184f01be5cc50
Dhash: difference hash
https://www.kaggle.com/iezepov/get-hash-from-images-slightly-daster/code
Read as b/w image, resize
Get numpy array
Difference between adjacent cells
149 168 145 131 134 111 115 114 108
198 192 162 135 104 137 128 108 97
158 165 151 117 111 133 130 139 115
79 95 132 151 180 212 189 158 124
91 47 57 90 67 81 165 142 110
104 80 63 53 43 34 20 42 101
110 113 109 92 79 53 27 59 102
114 111 110 112 108 67 73 90 103
149 168 145 131 134 111 115 114 108
198 192 162 135 104 137 128 108 97
158 165 151 117 111 133 130 139 115
79 95 132 151 180 212 189 158 124
91 47 57 90 67 81 165 142 110
104 80 63 53 43 34 20 42 101
110 113 109 92 79 53 27 59 102
141 111 110 112 108 67 73 90 103
1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8
19 -23 -15 4 -23 4 -1 -6
-6 -30 -27 -31 33 -9 -20 -11
7 -14 -34 -6 22 -3 9 -24
16 37 19 29 32 -23 -31 -34
-44 10 33 -23 14 84 -23 -32
-24 -17 -10 -10 -9 -14 22 59
3 -3 -18 -13 -26 -26 32 43
-3 -4 2 -4 -41 6 17 13
19 -23 -15 4 -23 4 -1 -6
-6 -30 -27 -31 33 -9 -20 -11
7 -14 -34 -6 22 -3 9 -24
16 37 19 29 32 -23 -31 -34
-44 10 33 -23 14 84 -23 -32
-24 -17 -10 -10 -9 -14 22 59
3 -3 -18 -13 -26 -26 32 43
-30 -4 2 -4 -41 6 17 13
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
FALSE FALSE TRUE FALSE FALSE TRUE TRUE TRUE
TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
FALSE FALSE TRUE FALSE FALSE TRUE TRUE TRUE
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
FALSE FALSE TRUE FALSE FALSE TRUE TRUE TRUE
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
148 94
8 08
138 8a
248 f8
108 64
3 03
131 83
39 27
94088af86c038327
Features: hashes
● Number of images with same md5, phash, dhash, etc
● Distances between hashes
94088af86c038327
94088af86c038328
1 0 0 0 0 0 1 1 0 0 1 0 0 1 1 1
1 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0
Distance is 4 bit
Candidate selection
● Category
● City / district
● Seller id
● IP address of the seller
● Device signature
● Image hashes
Image embeddings
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
“Image embeddings”
https://keras.io/applications/
Image embeddings
CNN
Dim 1k+
Dim 100
SVD
36a93c34a3abff
LSH
LSH: Random Projection
● Close in the original space ⇒ close in the projection
● Far in the original space ⇒ far in the projection
Generate the projection vectors
once and store them somewhere
LSH: Random projections
https://www.slideshare.net/AlexeyGrigorev/duplicates-everywhere
Use the vectors to reduce the
dimensionality and compute the
hash
Store the hash in the database
Why ElasticSearch?
● Well-known, convenient, stable and scalable inverted index (thanks, Lucene!)
1 00fc
2 12ec
3 00fc
4 ebe4
5 7a1f
6 00fc
7 8ef4
8 12ec
00fc 1 3 6
12ec 2 8
ebe4 4
7a1f 5
8ef4 7
Direct index Inverted index
ImageID Hash
Hash ImageID
Elasticsearch for hashes
For “fuzzy lookups” chunk the hash:
"94088af86c038327" => "1:9408 2:8af8 3:6c03 4:8327"
{
"_id": "cafebabe",
"_source": {
"title": "new iphone" ,
"description": "new iphone almost not used" ,
"hashes": ["1:9408 2:8af8 3:6c03 4:8327", ... ]
}
}
Let elasticsearch treat it as usual tokens
using e.g. whitespace tokenizer
Implementation Details
“More like this” queries
{
"query": {
"more_like_this": {
"like": {
"_index": "listings",
"_type": "_doc",
"_id": "cafebabe"
},
"max_query_terms": 100,
"fields": ["title^2", "description" , "hashes"]
}
}
}
{
"_id": "cafebabe",
"_source": {
"title": "new iphone" ,
"description": "new iphone almost not used" ,
"hashes": ["1:9408 2:8af8 3:6c03 4:8327", ... ]
}
}
How to actually do it?
Image index (simplified)
s3
Such description
So much text
Image index (simplified)
s3
Image index (simplified)
s3
ObjectCreated
{
"eventName": "ObjectCreated:Put",
"s3": {
"bucket": { "name": "pictures" },
"object": { "key": "doge.jpg" }
}
}
Image index (simplified)
s3
ObjectCreated
Hash calculation
Image index (simplified)
s3
ObjectCreated
Hash calculation
Image index (simplified)
s3
ObjectCreated
Hash calculation
https://pypi.org/project/ImageHash/
Image index (simplified)
hashes
Hash calculation
{
"dhash": "9687678c367b7b3a",
"phash": "ad60ad89b54b0d3d",
"whash": "fbf3804003199f9f"
}
Image index (simplified)
hashes
Ingestor
Hash calculation
ES
Image index (simplified)
hashes
Ingestor
Hash calculation
ESs3
ObjectCreated
It scalez!
Invocations per hour
190 rps
Invocations per day
How about deletes?
Image index (still simplified)
hashes
Ingestor
Hash calculation
ES
s3
ObjectCreated
ObjectDeleted
Ingestor
How about neural networks?
Image index
s3
ObjectCreated
ObjectDeleted
Ingestor
CNN+LSH Ingestor
Ingestor
ES
CNN+LSH
Options for deploying image models:
● Lambda
CNN+LSH
Options for deploying image models:
● Lambda
○ Tricky to do: HUGE TF binaries
○ 66 USD per 1 mln
○ Worth when load is low (< 1 mln)
CNN+LSH
Options for deploying image models:
● Lambda
○ Tricky to do: HUGE TF binaries
○ 66 USD per 1 mln
○ Worth when load is low (< 1 mln)
● Kubernetes
○ Easier to do: docker + existing cluster
○ More expensive for low load
○ Better for 1+ mln images per day
CNN+LSH
Options for deploying image models:
● Lambda
○ Tricky to do: HUGE TF binaries
○ 66 USD per 1 mln
○ Worth when load is low (< 1 mln)
● Kubernetes
○ Easier to do: docker + existing cluster
○ More expensive for low load
○ Better for 1+ mln images per day
The big picture
ML
Such description
So much text
Accept
Reject
Moderation queue
MP
Automatic
moderation system
Moderation panel
Accept
Reject
Moderators
s3
ES
Duplicate
detection
system
Image index
Hashes
Thanks!
Questions?

More Related Content

What's hot

I Don't Care About Security (And Neither Should You)
I Don't Care About Security (And Neither Should You)I Don't Care About Security (And Neither Should You)
I Don't Care About Security (And Neither Should You)Joel Lord
 
Getting a Grip on GraphQL
Getting a Grip on GraphQLGetting a Grip on GraphQL
Getting a Grip on GraphQLAnnyce Davis
 
Introduction to mongodb for bioinformatics
Introduction to mongodb for bioinformaticsIntroduction to mongodb for bioinformatics
Introduction to mongodb for bioinformaticsPierre Lindenbaum
 
JWT - To authentication and beyond!
JWT - To authentication and beyond!JWT - To authentication and beyond!
JWT - To authentication and beyond!Luís Cobucci
 
New Methods in Automated XSS Detection & Dynamic Exploit Creation
New Methods in Automated XSS Detection & Dynamic Exploit CreationNew Methods in Automated XSS Detection & Dynamic Exploit Creation
New Methods in Automated XSS Detection & Dynamic Exploit CreationKen Belva
 
XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015
XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015
XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015CODE BLUE
 
CSS Dev Conf - Braces to Pixels
CSS Dev Conf - Braces to PixelsCSS Dev Conf - Braces to Pixels
CSS Dev Conf - Braces to PixelsGreg Whitworth
 
Shibuya.abc - Gnashで遊ぼう
Shibuya.abc - Gnashで遊ぼうShibuya.abc - Gnashで遊ぼう
Shibuya.abc - Gnashで遊ぼうgyuque
 

What's hot (9)

I Don't Care About Security (And Neither Should You)
I Don't Care About Security (And Neither Should You)I Don't Care About Security (And Neither Should You)
I Don't Care About Security (And Neither Should You)
 
Getting a Grip on GraphQL
Getting a Grip on GraphQLGetting a Grip on GraphQL
Getting a Grip on GraphQL
 
Formal modeling and SMT-based parameterized verification of Data-Aware BPMN
Formal modeling and SMT-based parameterized verification of Data-Aware BPMNFormal modeling and SMT-based parameterized verification of Data-Aware BPMN
Formal modeling and SMT-based parameterized verification of Data-Aware BPMN
 
Introduction to mongodb for bioinformatics
Introduction to mongodb for bioinformaticsIntroduction to mongodb for bioinformatics
Introduction to mongodb for bioinformatics
 
JWT - To authentication and beyond!
JWT - To authentication and beyond!JWT - To authentication and beyond!
JWT - To authentication and beyond!
 
New Methods in Automated XSS Detection & Dynamic Exploit Creation
New Methods in Automated XSS Detection & Dynamic Exploit CreationNew Methods in Automated XSS Detection & Dynamic Exploit Creation
New Methods in Automated XSS Detection & Dynamic Exploit Creation
 
XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015
XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015
XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015
 
CSS Dev Conf - Braces to Pixels
CSS Dev Conf - Braces to PixelsCSS Dev Conf - Braces to Pixels
CSS Dev Conf - Braces to Pixels
 
Shibuya.abc - Gnashで遊ぼう
Shibuya.abc - Gnashで遊ぼうShibuya.abc - Gnashで遊ぼう
Shibuya.abc - Gnashで遊ぼう
 

Similar to Fighting fraud: finding duplicates at scale

Elasticsearch intro output
Elasticsearch intro outputElasticsearch intro output
Elasticsearch intro outputTom Chen
 
HTML5 and Other Modern Browser Game Tech
HTML5 and Other Modern Browser Game TechHTML5 and Other Modern Browser Game Tech
HTML5 and Other Modern Browser Game Techvincent_scheib
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDBantoinegirbal
 
Symfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsSymfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsIgnacio Martín
 
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)Андрей Новиков
 
Is HTML5 Ready? (workshop)
Is HTML5 Ready? (workshop)Is HTML5 Ready? (workshop)
Is HTML5 Ready? (workshop)Remy Sharp
 
Is html5-ready-workshop-110727181512-phpapp02
Is html5-ready-workshop-110727181512-phpapp02Is html5-ready-workshop-110727181512-phpapp02
Is html5-ready-workshop-110727181512-phpapp02PL dream
 
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013Amazon Web Services
 
Basics of html5, data_storage, css3
Basics of html5, data_storage, css3Basics of html5, data_storage, css3
Basics of html5, data_storage, css3Sreejith Nair
 
Browsers with Wings
Browsers with WingsBrowsers with Wings
Browsers with WingsRemy Sharp
 
Data Mining Open Ap Is
Data Mining Open Ap IsData Mining Open Ap Is
Data Mining Open Ap Isoscon2007
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
JavaOne 2009 - 2d Vector Graphics in the browser with Canvas and SVG
JavaOne 2009 -  2d Vector Graphics in the browser with Canvas and SVGJavaOne 2009 -  2d Vector Graphics in the browser with Canvas and SVG
JavaOne 2009 - 2d Vector Graphics in the browser with Canvas and SVGPatrick Chanezon
 
Grails Introduction - IJTC 2007
Grails Introduction - IJTC 2007Grails Introduction - IJTC 2007
Grails Introduction - IJTC 2007Guillaume Laforge
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampAlexei Gorobets
 
Schema design short
Schema design shortSchema design short
Schema design shortMongoDB
 
Php Code Audits (PHP UK 2010)
Php Code Audits (PHP UK 2010)Php Code Audits (PHP UK 2010)
Php Code Audits (PHP UK 2010)Damien Seguy
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Malli: inside data-driven schemas
Malli: inside data-driven schemasMalli: inside data-driven schemas
Malli: inside data-driven schemasMetosin Oy
 

Similar to Fighting fraud: finding duplicates at scale (20)

Elasticsearch intro output
Elasticsearch intro outputElasticsearch intro output
Elasticsearch intro output
 
HTML5 and Other Modern Browser Game Tech
HTML5 and Other Modern Browser Game TechHTML5 and Other Modern Browser Game Tech
HTML5 and Other Modern Browser Game Tech
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB
 
Symfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsSymfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worlds
 
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
 
Is HTML5 Ready? (workshop)
Is HTML5 Ready? (workshop)Is HTML5 Ready? (workshop)
Is HTML5 Ready? (workshop)
 
Is html5-ready-workshop-110727181512-phpapp02
Is html5-ready-workshop-110727181512-phpapp02Is html5-ready-workshop-110727181512-phpapp02
Is html5-ready-workshop-110727181512-phpapp02
 
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
 
Juggling
JugglingJuggling
Juggling
 
Basics of html5, data_storage, css3
Basics of html5, data_storage, css3Basics of html5, data_storage, css3
Basics of html5, data_storage, css3
 
Browsers with Wings
Browsers with WingsBrowsers with Wings
Browsers with Wings
 
Data Mining Open Ap Is
Data Mining Open Ap IsData Mining Open Ap Is
Data Mining Open Ap Is
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
JavaOne 2009 - 2d Vector Graphics in the browser with Canvas and SVG
JavaOne 2009 -  2d Vector Graphics in the browser with Canvas and SVGJavaOne 2009 -  2d Vector Graphics in the browser with Canvas and SVG
JavaOne 2009 - 2d Vector Graphics in the browser with Canvas and SVG
 
Grails Introduction - IJTC 2007
Grails Introduction - IJTC 2007Grails Introduction - IJTC 2007
Grails Introduction - IJTC 2007
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
Schema design short
Schema design shortSchema design short
Schema design short
 
Php Code Audits (PHP UK 2010)
Php Code Audits (PHP UK 2010)Php Code Audits (PHP UK 2010)
Php Code Audits (PHP UK 2010)
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Malli: inside data-driven schemas
Malli: inside data-driven schemasMalli: inside data-driven schemas
Malli: inside data-driven schemas
 

More from Alexey Grigorev

Codementor - Data Science at OLX
Codementor - Data Science at OLX Codementor - Data Science at OLX
Codementor - Data Science at OLX Alexey Grigorev
 
Data Monitoring with whylogs
Data Monitoring with whylogsData Monitoring with whylogs
Data Monitoring with whylogsAlexey Grigorev
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introductionAlexey Grigorev
 
AI in Fashion - Size & Fit - Nour Karessli
 AI in Fashion - Size & Fit - Nour Karessli AI in Fashion - Size & Fit - Nour Karessli
AI in Fashion - Size & Fit - Nour KaressliAlexey Grigorev
 
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAlexey Grigorev
 
ML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - KubernetesML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - KubernetesAlexey Grigorev
 
Paradoxes in Data Science
Paradoxes in Data ScienceParadoxes in Data Science
Paradoxes in Data ScienceAlexey Grigorev
 
ML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learningML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learningAlexey Grigorev
 
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble LearningML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble LearningAlexey Grigorev
 
ML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentAlexey Grigorev
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
ML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for ClassificationML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for ClassificationAlexey Grigorev
 
ML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for ClassificationML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for ClassificationAlexey Grigorev
 
ML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office HoursML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office HoursAlexey Grigorev
 
AMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplacesAMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplacesAlexey Grigorev
 
ML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction ProjectML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction ProjectAlexey Grigorev
 

More from Alexey Grigorev (20)

MLOps week 1 intro
MLOps week 1 introMLOps week 1 intro
MLOps week 1 intro
 
Codementor - Data Science at OLX
Codementor - Data Science at OLX Codementor - Data Science at OLX
Codementor - Data Science at OLX
 
Data Monitoring with whylogs
Data Monitoring with whylogsData Monitoring with whylogs
Data Monitoring with whylogs
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
 
AI in Fashion - Size & Fit - Nour Karessli
 AI in Fashion - Size & Fit - Nour Karessli AI in Fashion - Size & Fit - Nour Karessli
AI in Fashion - Size & Fit - Nour Karessli
 
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
 
ML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - KubernetesML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - Kubernetes
 
Paradoxes in Data Science
Paradoxes in Data ScienceParadoxes in Data Science
Paradoxes in Data Science
 
ML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learningML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learning
 
Algorithmic fairness
Algorithmic fairnessAlgorithmic fairness
Algorithmic fairness
 
MLOps at OLX
MLOps at OLXMLOps at OLX
MLOps at OLX
 
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble LearningML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
 
ML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deployment
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
ML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for ClassificationML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for Classification
 
ML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for ClassificationML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for Classification
 
ML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office HoursML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office Hours
 
AMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplacesAMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplaces
 
ML Zoomcamp 2 - Slides
ML Zoomcamp 2 - SlidesML Zoomcamp 2 - Slides
ML Zoomcamp 2 - Slides
 
ML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction ProjectML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction Project
 

Recently uploaded

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 

Recently uploaded (20)

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 

Fighting fraud: finding duplicates at scale