Tackling Challenges in Computer Vision

In the past few years we have been witnessing incredible
progress in the field of computer vision, mainly due to deep
learning.
Tackling challenges in
computer vision
Augustin Marty
CEO Deepomatic

Deep learning changed so much in solving image related questions.
You need to feed your model with examples. Give your model images, thousands of images so that it learns to differentiate.
Imagenet: Iconic challenge in the world of computer vision –with thousands of categories: all the images must be placed by algorithms into the correct category.
Democratisation of image recognition – error rate dropped from 26% to 3% today. 5% is the error rate of a human.
DEEP LEARNING
IS THE NEW PARADIGM

The deep learning progress was also made possible because the tech giants started to massively investing in it, developing bigger models (with more layers and millions of parameters) .
This is a heavy process
GoogleNet (2014)
22 LAYERS
ResNet (2015)
152 LAYERS

With bigger and more complex models and algorithms there is a need for great computing power.
Here NVIDIA’s CEO, Jen-Hsun Huang, is presenting and Open AI supercomputer to Elon Musk.
This is a relatively small box, aligned with GPU processing calculations amazingly fast.
This type of supercomputer is now affordable and accessible, especially since it is in the Cloud.
This progress and computing power has led to interesting applications in image recognition:
OPEN-AI SUPERCOMPUTER

google show and tell : Google’s image captioning model.
The model is able to describe the scene with a collection of
verbs, adjectives – like a human does.
Google’s Show and Tell

Style transfer is another example:
Deep learning algorithms can understand style of a
painting and reproduce it.
Here it understands Van Gogh’s expressionist painting and
- coupled with a picture of houses along a river -
reproduces the style, creating a new picture.
Style
Transfer

Video Coloration:
Artificial intelligence colours the video turning the black and
white video into a coloured one.
How was this achieved?:
1.Engineers took thousands of coloured videos and made
them black and white.
2. They then trained the algorithm to understand the
correlation between the B&W videos and the respective
coloured ones.
3. Then the algorithm was able to colour new, initially B&W
videos.
Video
Colouration

In specific industry problems those models don’t necessarily work. Here the following 3 images are taken from Microsoft’s image recognition platform online.
Here an automotive part is mistaken for a “close up of a plane” - not quite!
“Close up of a plane”

A terrorist is labeled as ‘man looking at the ocean' .
Of course there's an ocean and a man but first of all he’s
clearly looking away from the ocean. The machine doesn’t
recognise the dark knife in his hands, and can’t properly
identify the face because of the mask.
“Man looking at the ocean”

a group of men standing on a dirt field :
it’s not wrong: you have a group of men and yes it’s a dirt road.
But we humans, understand the context of this picture better , unlike the algorithm: these group of men are fighters – they have weapons, they are fighters, and are probably engaging in an act
of war
All this goes to show that progress is undeniable but there is much to still do.
When you want to apply these technologies to your industry or company specific challenges it might not work. You may think that therefore its not for you, that AI and computer vision doesn’t
solve your need
But…
“Group of men standing on top of a dirt field”

Artificial Intelligence is for everyone, for every company.
The following examples show industry specific problems
that were solved thanks to computer visions
AI IS FOR EVERY COMPANY

SADAKO TECHNOLOGIES- a Spanish firm- has developed a waste sorting device by combining robotics and computer vision.
They are therefore able to automatically distinguish plastic from other waste on a conveyer belt.
This can have incredibly promising applications for the future waste sorting systems and management,, and other cleaning applications
CASE STUDY: SADAKO
Waste Sorting

Regaind is a startup that is able to qualify the image
aesthetics. Selecting best pictures (amongst thousands of
pictures taken during a vacation) it creates photo albums
automatically by analysing the quality of the picture.
CASE STUDY: REGAIND
Image Aesthetics

Coming back to the image of the fighters.
Its possible, with todays’ technology, to develop weapon
detection for images and in videos. This has great use and
is of great importance to military and intelligence.
CASE STUDY: SECURITY
Weapon Detection

The three previous application have been developed by
small companies. If they were able to do this than so can
you, if you follow the right methodology
There is a secret sauce to tackle image recognition
challenges specific to your industry:
THE SECRET AI SAUCE
TO SOLVE
YOUR PROBLEMS

First, you need a deep learning framework.
These are available as they are open source. You just
need an engineer to use them.
A framework

Second ingredient: annotated images.
These must be relevant to the task you are tackling
The images differ for each use-case and problem – need to
develop a dataset.
ANNOTATED DATA

Annotators – human in the loop
HUMANS-IN-THE-LOOP

Assemble the 3 ingredients;
You trained an algorithm thanks to your dataset and
framework.
Your first algorithm is applied to never-before-seen images
(it is never perfect at first).
Your algorithm won’t give an answer on all new images
provides: you need humans-in-the-loop to keep annotating
those that weren’t (when the machine lacked confidence),
completing the task.
AI + annotators who complete the job and also keeps
building the dataset, creating a better algorithm …
Dataset
Neural network
models
Humans in
the loop
TrainingAnnotation
Calling humans when the
model is not sure
THE LOOP

This may all seem relatively simple but there’s a catch: you need to have a very good dataset for it to work well: a
huge amount of perfectly annotated images
Creating these datasets takes lot’s of time.
BUT…
THERE’S A CATCH

Here’s an example of furniture detection
To develop and algorithm that detects furniture in images you need a dataset with boxes around every single item in the image.
Consequently, you need to do this manually at first, making sure the boxes are perfectly around the item, and that no object is missed.
This takes 10 minutes for 1 image!
Sadako Technology, mentioned earlier, needed to do this to train their technology: they put millions of boxes around plastic bottles to create their dataset.
FURNITURE
DETECTION
(10 min)

Some tasks are even more time consuming. If you want to
develop algorithms for robotics (automated cars, robots,
drones etc.) – they need to understand their entire
environment.
So in this case, to train algorithms you need to determine
what each pixel represents in the image.
This segmentation task takes over an hour
URBAN
SEGMENTATIO
N
(70min)

The real bottleneck is now the dataset creation
DATA IS AI
BOTTLENECK

Good dataset creation is crucial to speed up the pace of AI
progress
LACK OF DATA IS SLOWING DOWN AI EXPANSION

To make datasets today there are 2 ways of doing it for now:
1) done internally by data scientists
2) Use crowdsourcing, such s Amazons Mechanical Turk – this isn’t too bad – but is time consuming and you need to do many quality reviews and check to ensure satisfactory results
Every data scientist has, at least once, thrown out a dataset due to its poor quality.
AMAZON
MECHANICAL
TURK
Time consuming,
poor quality
DO IT INTERNALLY
Make your data
scientists want to quit
or
Solutions

Real need for Industrialising the dataset creation process is the true solution to move forward to solve image related challenges for each company.
There are a few elements that may help the industrialisation and democratisation of the dataset creation:
INDUSTRIALISING THE
ANNOTATION PROCESS

1.
Improve the UX
of annotation tools
We need to have a dedicated software: today there is no software to produce datasets. Its crazy to think that each company develops their own small software.
A big leap in productivity can be achieved by simply improving the design and the annotation experience

Second element to increase pace fo AI production is to work on active learning.
You don’t want to annotate millions of images. Active learning is science that helps select the most informative image to build AI with as few images as possible
2.
Active Learning & HITL

Improve software with machine learning: if software knows
what you’re doing it can really improve the ease of the
3.
Improve tools with AI

We intend to reduce the time from 70 to 5 minutes
increasing the productivity by 10x
We intend to reduce the time from
70 to 5 minutes in 10 months

Here machine learning helps software to annotate images.
This video shows that if you are looking for a face the box
will automatically adjust around the head. The same goes
for when annotating objects pixel-wise – speeding up the
completion of the task.

AI really is for everyone and can solve any companies
challenges. Algorithms are becoming commodities and
datasets are the bottle neck of AI.
Democratising and industrialising the process of dataset
creation will allow for all of us, all companies to move
forward with their AI. Applications and goals.
THANK YOU

Tackling Challenges in Computer Vision

Recommended

Recommended

More Related Content

Similar to Tackling Challenges in Computer Vision

Similar to Tackling Challenges in Computer Vision (20)

Recently uploaded

Recently uploaded (20)

Tackling Challenges in Computer Vision

Editor's Notes