Reproducible data science and business solutions

From reproducible data
science to business
solutions
April 21st, 2021
● Translation of business problems to technical solutions
● Secure medical records
● Problems in computer vision
Image quality enhancement aka ‘beautification’
Image similarity evaluation aka ‘matching’
Image classification aka ‘tagging’
We’ll be talking about
Antonio Rueda-Toicen
Senior Data Scientist at Parkling GmbH
● Work on computer vision
● Background in computer science & biomedical applications
● Previously worked in academia, now teach data science at DSR and Thinkful
● Currently host the Berlin Computer Vision Group (look us up in Meetup!)
About me
https://airmedfoundation.thechain.tech/
Airmed Foundation: Secure medical records with IPFS and
Hyperledger Fabric
https://airmedfoundation.thechain.tech/
Airmed Foundation: secure medical records with IPFS and
Hyperledger Fabric
https://github.com/the-chain/airmedfoundation-terminal
Airmed Foundation: secure medical records with IPFS and
Hyperledger Fabric
What is ‘computer vision’?
What a human sees What the computer ‘sees’
● We are a search engine of vacation rentals
● We have 17 million offers and hundreds of millions of
images, the largest vacation rental inventory in the world
● Users want to envision the experience of a rental before
booking
Why we do computer vision at
HomeToGo?
Image quality enhancement
aka ‘beautification’
Industry story - AirBnB case
10
Industry story - AirBnB case
https://www.airbnb.com/professional_photography
11
https://www.airbnb.com/professional_photography
Industry story - AirBnB case
12
Why do we need image beautification
at HomeToGo?
13
Problem: we don’t control image
acquisition
14
Iphone 3GS camera Canon 70D (DSLR camera)
3 MP 20 MP
2048 x 1536 image size 3648 x 2432 image size
Original Blurred Original Blurred
How does image quality change look?
15
Industry’s current practices for
enhancing images
16
Our use of GANs
17
Let’s look at some beautified images
18
Let’s look at some beautified images
19
Let’s look at some beautified images
20
Image Similarity Evaluation
aka ‘Matching’
Why do we need to match offers
● Inventory understanding (we have a lot of it!)
● Providing the best deals for our users (sample use case: strike prices)
22
● Semantic similarity can be different to perceptual similarity
● We use a variety of distance and similarity metrics
● We also use different models ensembled in a deduplication pipeline
Evaluating similarity
23
Perceptual Hashing
94088af86c03827 94088af86c03827
Edit distance = 0 24
Perceptual Hashing
94088af86c03827 94088af86c03899
Edit distance = 2 25
How we evaluate our matching algorithms
True Positive = duplicate labeled as duplicate
True Negative = non duplicate labeled as non duplicate
False Positive = non duplicate labeled as duplicate
False Negative = duplicate labeled as non duplicate
26
Beware of false positives
27
Convolutional neural networks as feature
extractors
28
Convolutional neural networks as feature
extractors
29
Cosine similarity = 0.65
Convolutional neural networks as feature
extractors
30
Cosine similarity = 0.99
Convolutional neural networks as feature
extractors
31
Cosine similarity = 0.99
Image classification aka
‘Tagging’
Image Classification
● Outdoor
● Building
● Snow
what we see
Image Classification
● Outdoor
● Building
● Snow? Do we care about snow?
○ Enough of these images need
to be shown to the algorithm
what the computer “sees”
Why we do image classification?
● Inventory understanding
○ How many of our offers have pools, balconies, sea views?
○ Which images have better conversion rates?
● Targeted advertisement (SEO, CRM)
○ Newsletters
○ SEO landing pages
What do users care about?
● We do user research to define data
taxonomies
● We also define which rules are
convenient/feasible for our
algorithms
○ E.g. ‘if the sky is visible but we
are looking at it through a
window, the image should be
labeled as “indoor”’
36
Resnet
37
Labels for hard cases
● Bedroom
● Terrace
● Desk
● Vegetation
● Do we have enough images
that combine these things?
38
Labels for hard cases
● Should we have
added ‘neon lights’ to our
taxonomy?
● How many of these things
we have?
● Should we invest on this?
39
Object detection
40
Object detection
41
Getting more out of the humans in the loop
“Anybody that is trying to solve the problem of image tagging within a company
ends up rediscovering ‘active learning’, which is just using your model to guide
your labeling. Why should we be labeling everything if the machine is only doing
mistakes on these two hard classes?”
Jeremy Howard
● Services like Amazon SageMaker Groundtruth and human labeling in the
Google Vision API platform make this easier
42
Summary
● Creating value for starts with a careful consideration of the business problem
:)
43
44
https://datascienceretreat.com/
https://www.meetup.com/Berlin-Computer-Vision-Group/
1 of 45

More Related Content

What's hot(16)

Practical Digital Image Processing 2Practical Digital Image Processing 2
Practical Digital Image Processing 2
Aly Abdelkareem1.4K views
Fahad Fazal Elahi GurayaFahad Fazal Elahi Guraya
Fahad Fazal Elahi Guraya
kimberleychen1.3K views
02_atiqa ijaz khan_05_201402_atiqa ijaz khan_05_2014
02_atiqa ijaz khan_05_2014
Atiqa khan426 views
CamshaftCamshaft
Camshaft
Ahmed Tememe1.7K views
Introduction to Computer VisionIntroduction to Computer Vision
Introduction to Computer Vision
Componica LLC2.3K views
CamshiftCamshift
Camshift
Alreza Kahfi1.3K views
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: review
Dmytro Mishkin4.6K views
 Practical Digital Image Processing 3 Practical Digital Image Processing 3
Practical Digital Image Processing 3
Aly Abdelkareem1.4K views
Practical Digital Image Processing 5Practical Digital Image Processing 5
Practical Digital Image Processing 5
Aly Abdelkareem573 views
Image Indexing and RetrievalImage Indexing and Retrieval
Image Indexing and Retrieval
Rachmat Wahid Saleh Insani3.5K views
Object tracking surveyObject tracking survey
Object tracking survey
Rich Nguyen7.9K views
IEEE ICAPR 2009IEEE ICAPR 2009
IEEE ICAPR 2009
Dakshina Ranjan Kisku477 views

Similar to Reproducible data science and business solutions(20)

Image Analytics for RetailImage Analytics for Retail
Image Analytics for Retail
AlgoAnalytics Financial Consultancy Pvt. Ltd.2.1K views
QAI brochureQAI brochure
QAI brochure
Teresa Escrig, PhD339 views
Analytics in Online RetailAnalytics in Online Retail
Analytics in Online Retail
AlgoAnalytics Financial Consultancy Pvt. Ltd.2.1K views
User experience workshopUser experience workshop
User experience workshop
GYK Antler1.8K views
Embeddings! embeddings everywhere!Embeddings! embeddings everywhere!
Embeddings! embeddings everywhere!
Maciej Arciuch426 views
Egypt LayoutEgypt Layout
Egypt Layout
Pay Someone To Write My Paper Biddeford5 views
RD2S_algorithmRD2S_algorithm
RD2S_algorithm
jenis makwana36 views

Recently uploaded(20)

How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)
Narendra Narendra10 views
Introduction to Microsoft Fabric.pdfIntroduction to Microsoft Fabric.pdf
Introduction to Microsoft Fabric.pdf
ishaniuudeshika21 views
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann102 views
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 12 views
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9011 views
Microsoft Fabric.pptxMicrosoft Fabric.pptx
Microsoft Fabric.pptx
Shruti Chaurasia19 views
PTicketInput.pdfPTicketInput.pdf
PTicketInput.pdf
stuartmcphersonflipm314 views
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej6 views
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdf
HiNedHaJar14 views
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela166 views
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
thomasjvarghese4918 views

Reproducible data science and business solutions