Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reproducible data science and business solutions

"Reproducible data science and business solutions" presentation by Antonio Rueda-Toicen for the Women and Diversity in Economics Group at the University of San Francisco, 21/04/2021.

  • Be the first to comment

  • Be the first to like this

Reproducible data science and business solutions

  1. 1. From reproducible data science to business solutions April 21st, 2021
  2. 2. ● Translation of business problems to technical solutions ● Secure medical records ● Problems in computer vision Image quality enhancement aka ‘beautification’ Image similarity evaluation aka ‘matching’ Image classification aka ‘tagging’ We’ll be talking about
  3. 3. Antonio Rueda-Toicen Senior Data Scientist at Parkling GmbH ● Work on computer vision ● Background in computer science & biomedical applications ● Previously worked in academia, now teach data science at DSR and Thinkful ● Currently host the Berlin Computer Vision Group (look us up in Meetup!) About me
  4. 4. https://airmedfoundation.thechain.tech/ Airmed Foundation: Secure medical records with IPFS and Hyperledger Fabric
  5. 5. https://airmedfoundation.thechain.tech/ Airmed Foundation: secure medical records with IPFS and Hyperledger Fabric
  6. 6. https://github.com/the-chain/airmedfoundation-terminal Airmed Foundation: secure medical records with IPFS and Hyperledger Fabric
  7. 7. What is ‘computer vision’? What a human sees What the computer ‘sees’
  8. 8. ● We are a search engine of vacation rentals ● We have 17 million offers and hundreds of millions of images, the largest vacation rental inventory in the world ● Users want to envision the experience of a rental before booking Why we do computer vision at HomeToGo?
  9. 9. Image quality enhancement aka ‘beautification’
  10. 10. Industry story - AirBnB case 10
  11. 11. Industry story - AirBnB case https://www.airbnb.com/professional_photography 11
  12. 12. https://www.airbnb.com/professional_photography Industry story - AirBnB case 12
  13. 13. Why do we need image beautification at HomeToGo? 13
  14. 14. Problem: we don’t control image acquisition 14
  15. 15. Iphone 3GS camera Canon 70D (DSLR camera) 3 MP 20 MP 2048 x 1536 image size 3648 x 2432 image size Original Blurred Original Blurred How does image quality change look? 15
  16. 16. Industry’s current practices for enhancing images 16
  17. 17. Our use of GANs 17
  18. 18. Let’s look at some beautified images 18
  19. 19. Let’s look at some beautified images 19
  20. 20. Let’s look at some beautified images 20
  21. 21. Image Similarity Evaluation aka ‘Matching’
  22. 22. Why do we need to match offers ● Inventory understanding (we have a lot of it!) ● Providing the best deals for our users (sample use case: strike prices) 22
  23. 23. ● Semantic similarity can be different to perceptual similarity ● We use a variety of distance and similarity metrics ● We also use different models ensembled in a deduplication pipeline Evaluating similarity 23
  24. 24. Perceptual Hashing 94088af86c03827 94088af86c03827 Edit distance = 0 24
  25. 25. Perceptual Hashing 94088af86c03827 94088af86c03899 Edit distance = 2 25
  26. 26. How we evaluate our matching algorithms True Positive = duplicate labeled as duplicate True Negative = non duplicate labeled as non duplicate False Positive = non duplicate labeled as duplicate False Negative = duplicate labeled as non duplicate 26
  27. 27. Beware of false positives 27
  28. 28. Convolutional neural networks as feature extractors 28
  29. 29. Convolutional neural networks as feature extractors 29 Cosine similarity = 0.65
  30. 30. Convolutional neural networks as feature extractors 30 Cosine similarity = 0.99
  31. 31. Convolutional neural networks as feature extractors 31 Cosine similarity = 0.99
  32. 32. Image classification aka ‘Tagging’
  33. 33. Image Classification ● Outdoor ● Building ● Snow what we see
  34. 34. Image Classification ● Outdoor ● Building ● Snow? Do we care about snow? ○ Enough of these images need to be shown to the algorithm what the computer “sees”
  35. 35. Why we do image classification? ● Inventory understanding ○ How many of our offers have pools, balconies, sea views? ○ Which images have better conversion rates? ● Targeted advertisement (SEO, CRM) ○ Newsletters ○ SEO landing pages
  36. 36. What do users care about? ● We do user research to define data taxonomies ● We also define which rules are convenient/feasible for our algorithms ○ E.g. ‘if the sky is visible but we are looking at it through a window, the image should be labeled as “indoor”’ 36
  37. 37. Resnet 37
  38. 38. Labels for hard cases ● Bedroom ● Terrace ● Desk ● Vegetation ● Do we have enough images that combine these things? 38
  39. 39. Labels for hard cases ● Should we have added ‘neon lights’ to our taxonomy? ● How many of these things we have? ● Should we invest on this? 39
  40. 40. Object detection 40
  41. 41. Object detection 41
  42. 42. Getting more out of the humans in the loop “Anybody that is trying to solve the problem of image tagging within a company ends up rediscovering ‘active learning’, which is just using your model to guide your labeling. Why should we be labeling everything if the machine is only doing mistakes on these two hard classes?” Jeremy Howard ● Services like Amazon SageMaker Groundtruth and human labeling in the Google Vision API platform make this easier 42
  43. 43. Summary ● Creating value for starts with a careful consideration of the business problem :) 43
  44. 44. 44 https://datascienceretreat.com/
  45. 45. https://www.meetup.com/Berlin-Computer-Vision-Group/

×