SlideShare a Scribd company logo
Deep Representation: Building
a Semantic Image Search
Engine
Emmanuel Ameisen
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
semantic-search-engine
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
PINTEREST SEARCH
IMAGE SEARCH ENGINE
IMAGE TAGGING
thenextweb.com
BACKGROUND
▰ Why am I speaking about this?
ABOUT INSIGHT
DATA SCIENCE
DATA ENGINEERING
HEALTH DATA
ARTIFICIAL INTELLIGENCE
7-Week Fellowship in
+ REMOTE
SILICON VALLEY & SAN
FRANCISCO
NEW YORK
BOSTON
SEATTLE
PRODUCT MANAGEMENT
TORONTO
DEVOPS
www.insightdata.ai
INSIGHT DATA – FELLOW PROJECTS
FASHION CLASSIFIER AUTOMATIC REVIEW GENERATION READING TEXT IN VIDEOS
HEART SEGMENTATION SUPPORT REQUEST CLASSIFICATION SPEECH UNSAMPLING
1,600+
INSIGHT ALUMNI
INSIGHT FELLOWS ARE DATA SCIENTISTS AND DATA ENGINEERS
EVERYWHERE
400+
COMPANIES
ON THE MENU
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
ON THE MENU
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
▰ Massive models
▻ Dataset of 1M+images
▻ For multiple days
▰ Automates feature engineering
▰ Use cases
▻ Fashion
▻ Security
▻ Medicine
▻ …
CONVOLUTIONAL NEURAL NETWORKS
(CNN)
▰ Incorporates local and global information
▰ Use cases
▻ Medical
▻ Security
▻ Autonomous Vehicles
EXTRACTING INFORMATION
@arthur_ouaknine
▰ Pose Estimation
▰ Scene Parsing
▰ 3D Point cloud estimation
ADVANCED APPLICATIONS
Insight Fellow Project with Piccolo
Felipe Mejia
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
ON THE MENU
▰ Traditional NLP tasks
▻ Classification (sentiment analysis, spam detection, code classification)
▰ Extracting Information
▻ Named Entity Recognition, Information extraction
▰ Advanced applications
▻ Translation, sequence to sequence learning
NLP
▰ Sequence to sequence models are still often too
rough to be deployed, even with sizable datasets
▻ Recognized Tosh as a swear word
▰ They can be used efficiently for data augmentation
▻ Paired with other latent approaches
SENTENCE PARAPHRASING
Victor Suthichai
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
ON THE MENU
▰ Prime language model with features
extracted from CNN
▰ Feed to an NLP language model
▰ End-to-end
▻ Elegant
▻ Hard to debug and validate
▻ Hard to productionize
IMAGE CAPTIONING
A horse is standing in a field with a fence in the background.
CODE GENERATION
Ashwin Kumar
§ Harder problem for humans
- Anyone can describe an image
- Coding takes specific training
§ We can solve it using a similar model
§ The trick is in getting the data!
▰ These methods mix and match different architectures
▰ The combined representation is often learned implicitly
▻ Hard to cache and optimize to re-use across services
▻ Hard to validate and do QA on
▰ The models are entangled
▻ What if we want to learn a simple joint representation?
BUT DOES IT SCALE?
Image Search
Goals
§ Searching for similar images to an input image
- Computer Vision: (Image → Image)
§ Searching for images using text & generating tags for images
- Computer Vision + Natural Language Processing: (Image ↔ Text)
§ Bonus: finding similar words to an input word
- Natural Language Processing: (Text → Text)
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
ON THE MENU
Image Based Search
Let’s build this!
Dataset
Credit to Cyrus Rashtchian, PeterYoung, Micah Hodosh, and Julia Hockenmaier for the dataset.
§ 1000 images
- 20 classes, 50 images per class
§ 3 orders of magnitude smaller than usual deep
learning datasets
§ Noisy
WHICH CLASS?
DATA PROBLEMS
Bottle L
A FEW APPROACHES
§ Ways to think about searching for similar images
IF WE HAD INFINITE DATA
§ Train on all images
§ Pros:
- One Forward Pass (fast inference)
§ Cons:
- Hard too optimize
- Poor scaling
- Frequent Retraining
SIMILARITY MODEL
§ Train on each image pair
§ Pros:
- Scales to large datasets
§ Cons:
- Slow
- Does not work for text
- Needs good examples
EMBEDDING MODEL
§ Find embedding for each image
§ Calculate ahead of time
§ Pros:
- Scalable
- Fast
§ Cons:
- Simple representations
Mikolov et Al. 2013
WORD EMBEDDINGS
LEVERAGING A PRE-TRAINED MODEL
HOW AN EMBEDDING LOOKS
PROXIMITY SEARCH IS FAST
How do you find the 5 most similar images to a given one when you
have over a million users?
▰ Fast index search
▰ Spotify uses annoy (we will as well)
▰ Flickr uses LOPQ
▰ Nmslib is also very fast
▰ Some rely on making the queries approximate in order to make them
fast
PRETTY IMPRESSIVE!
IN OUT
FOCUSING OUR SEARCH
§ Sometimes we are only interested in part of the image.
§ For example, given an image of a cat and a bottle, we might be only interested in similar cats, not similar bottles.
§ How do we incorporate this information
IMPROVING RESULTS: STILL NO TRAINING
§ Computationally expensive approach:
- Object detection model first
- (We don’t do this)
- Image search on a cropped image
- (We don’t do this)
§ Semi-Supervised approach:
- Hacky, but efficient!
- re-weighing the activations
- Only use the class of interest to re-
weigh embeddings
EVEN BETTER
IN OUT
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
ON THE MENU
GENERALIZING
§ We have added some ability to guide the search, but it is limited to classes our model was initially trained on
§ We would like to be able to use any word
§ How do we combine words and images?
Mikolov et Al. 2013
WORD EMBEDDINGS
SEMANTIC TEXT!
§ Load a set of pre-trained vectors (GloVe)
- Wikipedia data
- Semantic relationships
§ One big issue:
- The embeddings for images are of size 4096
- While those for words are of size 300 
- And both models trained in a different fashion
§ What we need: Joint model!
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
ON THE MENU
Inspiration
TIME TO TRAIN
Image à Image Image à Text
IMAGE à TEXT
§ Re-train model to predict the word vector
- i.e. 300-length vector associated with cat
§ Training
- Takes more time per example than image à class
- But much faster than on Imagenet (7 hours, no GPU)
§ Important to note
- Training data can be very small: ~1000 images
- Miniscule compared to Imagenet (1+ Million images)
§ Once model is trained
- Build a new fast index of images
- Save to disk
How do you
think this
model will
perform?
IMAGE à TEXT
GENERALIZED IMAGE SEARCH WITH MINIMAL
DATA
IN:“DOG” OUT
SEARCH FOR WORD NOT IN DATASET
IN:“OCEAN” OUT
SEARCH FOR WORD NOT IN DATASET
IN:“STREET” OUT
MULTIPLE WORDS!
MULTIPLE WORDS!
IN:“CAT SOFA” OUT
Learn More:
Find the repo on Github!
Next steps
§ Incorporating user feedback
- Most real world image search systems use user clicks as a signal
§ Capturing domain specific aspects
- Often times, users have different meanings for similarity
§ Keep the conversation going
- Reach me on Twitter @EmmanuelAmeisen
www.insightdata.ai/apply
EMMANUEL AMEISEN
Head of AI, ML Engineer
@emmanuelameisen
emmanuel@insightdata.ai
bit.ly/imagefromscratch
CV Approaches
White-box Algorithms Black-Box Algorithms
@Andrey Nikishaev
▰ NLP Classification is generally more shallow
▻ Logistic Regression/Naïve Bayes
▻ Two layer CNN
▰ This is starting to change
▻ The triumph of pre-training and transfer learning
CLASSIFICATION
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
semantic-search-engine

More Related Content

Similar to Deep Representation: Building a Semantic Image Search Engine

Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
Goergen Institute for Data Science
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
Goergen Institute for Data Science
 
Declarative programming v5
Declarative programming v5Declarative programming v5
Declarative programming v5
jcarpioc
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
Maheshrao929885
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
krutika95
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
aishwarya202897
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
prabhakaranIyer
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
Akash553872
 
Nailing Distributed Development With Effective Collaboration - Matt Ryall
Nailing Distributed Development With Effective Collaboration - Matt RyallNailing Distributed Development With Effective Collaboration - Matt Ryall
Nailing Distributed Development With Effective Collaboration - Matt Ryall
Atlassian
 
Machine Learning for Designers
Machine Learning for DesignersMachine Learning for Designers
Machine Learning for Designers
Memi Beltrame
 
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business AgilityAgile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
AgileNetwork
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
NUS-ISS
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overview
alessio_ferrari
 
Spreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG PresentationSpreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG Presentation
Dan English
 
Introducción a NLP (Natural Language Processing) en Azure
Introducción a NLP (Natural Language Processing) en AzureIntroducción a NLP (Natural Language Processing) en Azure
Introducción a NLP (Natural Language Processing) en Azure
Plain Concepts
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
Paris Open Source Summit
 
How to Internationalize Products by fmr Condé Nast Int. PM
How to Internationalize Products by fmr Condé Nast Int. PMHow to Internationalize Products by fmr Condé Nast Int. PM
How to Internationalize Products by fmr Condé Nast Int. PM
Product School
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
Machine Learning Prague
 
Week3 final
Week3 finalWeek3 final
Week3 finaleducw200
 

Similar to Deep Representation: Building a Semantic Image Search Engine (20)

Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Declarative programming v5
Declarative programming v5Declarative programming v5
Declarative programming v5
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
 
Nailing Distributed Development With Effective Collaboration - Matt Ryall
Nailing Distributed Development With Effective Collaboration - Matt RyallNailing Distributed Development With Effective Collaboration - Matt Ryall
Nailing Distributed Development With Effective Collaboration - Matt Ryall
 
Machine Learning for Designers
Machine Learning for DesignersMachine Learning for Designers
Machine Learning for Designers
 
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business AgilityAgile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overview
 
Spreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG PresentationSpreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG Presentation
 
Introducción a NLP (Natural Language Processing) en Azure
Introducción a NLP (Natural Language Processing) en AzureIntroducción a NLP (Natural Language Processing) en Azure
Introducción a NLP (Natural Language Processing) en Azure
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
How to Internationalize Products by fmr Condé Nast Int. PM
How to Internationalize Products by fmr Condé Nast Int. PMHow to Internationalize Products by fmr Condé Nast Int. PM
How to Internationalize Products by fmr Condé Nast Int. PM
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Week3 final
Week3 finalWeek3 final
Week3 final
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
C4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
C4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
C4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
C4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
C4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
C4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
C4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
C4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
C4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
C4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
C4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
C4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
C4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
C4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
C4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
C4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
C4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
C4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
C4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Deep Representation: Building a Semantic Image Search Engine

  • 1. Deep Representation: Building a Semantic Image Search Engine Emmanuel Ameisen
  • 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ semantic-search-engine
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 7. BACKGROUND ▰ Why am I speaking about this?
  • 8. ABOUT INSIGHT DATA SCIENCE DATA ENGINEERING HEALTH DATA ARTIFICIAL INTELLIGENCE 7-Week Fellowship in + REMOTE SILICON VALLEY & SAN FRANCISCO NEW YORK BOSTON SEATTLE PRODUCT MANAGEMENT TORONTO DEVOPS www.insightdata.ai
  • 9. INSIGHT DATA – FELLOW PROJECTS FASHION CLASSIFIER AUTOMATIC REVIEW GENERATION READING TEXT IN VIDEOS HEART SEGMENTATION SUPPORT REQUEST CLASSIFICATION SPEECH UNSAMPLING
  • 11. INSIGHT FELLOWS ARE DATA SCIENTISTS AND DATA ENGINEERS EVERYWHERE 400+ COMPANIES
  • 12. ON THE MENU ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both
  • 13. ON THE MENU ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both
  • 14. ▰ Massive models ▻ Dataset of 1M+images ▻ For multiple days ▰ Automates feature engineering ▰ Use cases ▻ Fashion ▻ Security ▻ Medicine ▻ … CONVOLUTIONAL NEURAL NETWORKS (CNN)
  • 15. ▰ Incorporates local and global information ▰ Use cases ▻ Medical ▻ Security ▻ Autonomous Vehicles EXTRACTING INFORMATION @arthur_ouaknine
  • 16. ▰ Pose Estimation ▰ Scene Parsing ▰ 3D Point cloud estimation ADVANCED APPLICATIONS Insight Fellow Project with Piccolo Felipe Mejia
  • 17. ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both ON THE MENU
  • 18. ▰ Traditional NLP tasks ▻ Classification (sentiment analysis, spam detection, code classification) ▰ Extracting Information ▻ Named Entity Recognition, Information extraction ▰ Advanced applications ▻ Translation, sequence to sequence learning NLP
  • 19. ▰ Sequence to sequence models are still often too rough to be deployed, even with sizable datasets ▻ Recognized Tosh as a swear word ▰ They can be used efficiently for data augmentation ▻ Paired with other latent approaches SENTENCE PARAPHRASING Victor Suthichai
  • 20. ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both ON THE MENU
  • 21. ▰ Prime language model with features extracted from CNN ▰ Feed to an NLP language model ▰ End-to-end ▻ Elegant ▻ Hard to debug and validate ▻ Hard to productionize IMAGE CAPTIONING A horse is standing in a field with a fence in the background.
  • 22. CODE GENERATION Ashwin Kumar § Harder problem for humans - Anyone can describe an image - Coding takes specific training § We can solve it using a similar model § The trick is in getting the data!
  • 23. ▰ These methods mix and match different architectures ▰ The combined representation is often learned implicitly ▻ Hard to cache and optimize to re-use across services ▻ Hard to validate and do QA on ▰ The models are entangled ▻ What if we want to learn a simple joint representation? BUT DOES IT SCALE?
  • 25. Goals § Searching for similar images to an input image - Computer Vision: (Image → Image) § Searching for images using text & generating tags for images - Computer Vision + Natural Language Processing: (Image ↔ Text) § Bonus: finding similar words to an input word - Natural Language Processing: (Text → Text)
  • 26. ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both ON THE MENU
  • 28. Dataset Credit to Cyrus Rashtchian, PeterYoung, Micah Hodosh, and Julia Hockenmaier for the dataset. § 1000 images - 20 classes, 50 images per class § 3 orders of magnitude smaller than usual deep learning datasets § Noisy
  • 31. A FEW APPROACHES § Ways to think about searching for similar images
  • 32. IF WE HAD INFINITE DATA § Train on all images § Pros: - One Forward Pass (fast inference) § Cons: - Hard too optimize - Poor scaling - Frequent Retraining
  • 33. SIMILARITY MODEL § Train on each image pair § Pros: - Scales to large datasets § Cons: - Slow - Does not work for text - Needs good examples
  • 34. EMBEDDING MODEL § Find embedding for each image § Calculate ahead of time § Pros: - Scalable - Fast § Cons: - Simple representations
  • 35. Mikolov et Al. 2013 WORD EMBEDDINGS
  • 38. PROXIMITY SEARCH IS FAST How do you find the 5 most similar images to a given one when you have over a million users? ▰ Fast index search ▰ Spotify uses annoy (we will as well) ▰ Flickr uses LOPQ ▰ Nmslib is also very fast ▰ Some rely on making the queries approximate in order to make them fast
  • 40. FOCUSING OUR SEARCH § Sometimes we are only interested in part of the image. § For example, given an image of a cat and a bottle, we might be only interested in similar cats, not similar bottles. § How do we incorporate this information
  • 41. IMPROVING RESULTS: STILL NO TRAINING § Computationally expensive approach: - Object detection model first - (We don’t do this) - Image search on a cropped image - (We don’t do this) § Semi-Supervised approach: - Hacky, but efficient! - re-weighing the activations - Only use the class of interest to re- weigh embeddings
  • 43. ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both ON THE MENU
  • 44. GENERALIZING § We have added some ability to guide the search, but it is limited to classes our model was initially trained on § We would like to be able to use any word § How do we combine words and images?
  • 45. Mikolov et Al. 2013 WORD EMBEDDINGS
  • 46. SEMANTIC TEXT! § Load a set of pre-trained vectors (GloVe) - Wikipedia data - Semantic relationships § One big issue: - The embeddings for images are of size 4096 - While those for words are of size 300  - And both models trained in a different fashion § What we need: Joint model!
  • 47. ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both ON THE MENU
  • 49. TIME TO TRAIN Image à Image Image à Text
  • 50. IMAGE à TEXT § Re-train model to predict the word vector - i.e. 300-length vector associated with cat § Training - Takes more time per example than image à class - But much faster than on Imagenet (7 hours, no GPU) § Important to note - Training data can be very small: ~1000 images - Miniscule compared to Imagenet (1+ Million images) § Once model is trained - Build a new fast index of images - Save to disk How do you think this model will perform?
  • 52. GENERALIZED IMAGE SEARCH WITH MINIMAL DATA IN:“DOG” OUT
  • 53. SEARCH FOR WORD NOT IN DATASET IN:“OCEAN” OUT
  • 54. SEARCH FOR WORD NOT IN DATASET IN:“STREET” OUT
  • 57. Learn More: Find the repo on Github!
  • 58. Next steps § Incorporating user feedback - Most real world image search systems use user clicks as a signal § Capturing domain specific aspects - Often times, users have different meanings for similarity § Keep the conversation going - Reach me on Twitter @EmmanuelAmeisen
  • 59. www.insightdata.ai/apply EMMANUEL AMEISEN Head of AI, ML Engineer @emmanuelameisen emmanuel@insightdata.ai bit.ly/imagefromscratch
  • 60. CV Approaches White-box Algorithms Black-Box Algorithms @Andrey Nikishaev
  • 61. ▰ NLP Classification is generally more shallow ▻ Logistic Regression/Naïve Bayes ▻ Two layer CNN ▰ This is starting to change ▻ The triumph of pre-training and transfer learning CLASSIFICATION
  • 62. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ semantic-search-engine