13. • Intelligence and perception are intertwined
“Nous” (Aristotle)
• Solving speech / vision for computers will
help solve AI
Deep learning started in CV
14. 14
Video
Background/Foreground
Detector
Object Tracking
Position Change Position
Matching
Place
Occupancy
Virtual
Turnstile
Speed
Estimation
Line
Analysis
Scene Analysis
and Registration
Object Classification
Current
BG
Background
image
Objects
Object
Categories
Object
Trajectories
Scene
Structure
Cars, People,
Vehicles Counts
Vehicle,
Pedestrian speed
Estimated Wait
Time, line size
How busy
places are
Foot / ped
traffic
P
P
PPP
P
U
U
U
U U
U
P Protected by patent
U Unique approach, specific know-how
Known
Positions
P
Let’s start with a quick introduction to computer vision.
Meet David Marr, neuroscientist, psychologist, and the actual inventor of computer vision.
A tragic destiny in various aspects. First of all, he died at 35, in 1980, two years before his master work of a book was even published. Second, he got famous for something he didn’t to on purpose: inventing computer vision.
His idea was to understand the brain by trying to solve the problem it solves, and he started by vision. Even though he did not mean it, and would never know about it, this was the founding moment of computer vision.
Why is computer vision a hard problem?
Well digital images are just matrices of pixels. Each pixel has three values. R, G and B. When you display these on a screen, the human visual system interprets it and recognizes known objects, like letters or people, attributes, and concepts.
Try to think of an algorithm to transform the matrix of zeros and ones into the 8th letter of the ASCII charset. OK, you can get back to me on that later.
Now think about how to get from the picture below to recognizing a face, a male, an expression, and matching the same face on the internet to understand who that person is…
Cherry on the cake, add a third dimension to the problem: time. And you get to video analysis.
This is the challenge of computer vision. Getting from raw arrays of numbers to known objects, concepts, movements, behaviors, and even sentiments.
The birth of computer vision is also due to another major player: the SUPER computers of the 80’s like this DEC VAX / VMS. Lots of pixels require powerful machines and storage. The VMS offered both. Magnetic tapes could store and retrieve up to 10 digital images each … in minutes!
And this was NOT fun – I actually used one of those during the first 6 months of my PhD – French research labs are really underfunded,…
My point is that the birth of and all subsequent progresses in computer vision have seen the same alignment of stars of:
one smart guy,
Leap in storage
a leap in computation power.
Computer vision was originally reserved to high budget, high security applications. Missile guidance, satellite reconnaissance, industrial production, medical imaging.
Critical applications
Highly aided applications
Fast forward 20 years – I’ll skip most of the iterative progress, including the major contribution of my PhD thesis - and let’s focus on what it does now.
- Google buys Neven Vision in 2005, introduces image search on google web search. You can now search by color, image similarity, etc…
Amazon buys SnapTell in 2009. You can now by a book by pointing your phone to it. Or interact with an ad, or preview a movie by pointing your phone to a poster.
Facebook bought face.com in 2012, and apple bought polar rose in 2010. And now all your friends can be tagged automatically in your images.
Samsung TVs and phones can now be controlled by gesture.
And some digital advertising displays now recognize your gender, attention span, and emotions - quividi
Vivino wine app has 5 mm users
This is all here already. Image recognition is all over your digital life. Don’t use copyrighted images on your website, you‘ll get caught!
Next time you hangout with computer vision people, (you never know), here are just a couple of buzz words you absolutely need to know.
Interestingly enough, Yann LeCun, the inventor of Deep Learning and CNNs was just hired by Facebook. Who knows what they’re preparing there!
And Google just acquired a deep learning play called DeepMind, with just a landing page, for $400m+.
Recognize one specific, mostly 2D object.
Interest points detection
Local feature computation and indexing
Search and match
Recognize one type of object, starting with faces.
“Simple features” combination
Each feature has to be a little more than 50%. And removes some candidates.
Train following layer on the output of a previous one.
Recognize many types of objects
Self organize content
Non supervised learning
Applications: not clear…
What’s next? Tons of data come from video feeds now (previous explosion was photos)
Since Steve Mann started wearing cameras 24/7 in the 90’s, it seems like everyone likes to attach cameras to their lives.
Dashboard cameras can help monitor traffic incidents in real time, without the need to manually enter data like in Waze.
Google could probably use google glass feeds to monitor its consumers habits.
Dropcams, originally for security, are now a hub for the connected home and can are also used in countless public places to show consumers how it currently is. Their streams could be used to quantify your home or public places as well.
And I will spare you the commercial drones of which you probably have heard a lot already
More and more cameras are there and computer vision will sooner or later turn them all into local data feeds.
We have protected most of the three layers of our algorithm stack with two patents.
A number of the functions implemented in our stack are unique
and require proprietary, very specialized data and knowledge to build.
We are very strict on implementing test driven development principles to our algorithm development processes.