Computer Vision - now working  in over 2 Billion Web Browsers!

Computer Vision - now working 
in over 2 Billion Web Browsers!
Rob Manson 
CEO & co-founder
Sebastian Montabone 
Computer Vision Engineer
Mixed Reality. In the web. On any device.
https://try.awe.media

So what is Mixed Reality?
Here’s a short demo of Milgram’s Mixed Reality Continuum - all running in a browser.
awe.media

A brief/biased history of Computer Vision
1957 - Russel A. Kirsch scans first photo with a computer
1960 - Larry Roberts publishes thesis at MIT
1964 - First facial recognition system (unamed intelligence agency)
1976 - UK Police create first License Plate recognition system
1978 - David Marr proposes edge detection framework at MIT
1985 - Lockheed Martin/Carnegie Mellon create first self-driving land vehicle
1992 - Tom Caudell at Boeing coins the term Augmented Reality
1999 - Billinghurst & Kato publish/demo ARToolkit at IWAR/SIGGRAPH
2000 - Windows only alpha version of OpenCV launched at CVPR
2007 - OpenCV 1.0 released
2008 - ARToolkit ported to Flash by @saqoosha
2011 - ARToolkit ported to Javascript by Ilmari Heikkinen
2011 - FastCV/Vuforia 1.0 released
2017 - Facebook adds Computer Vision to their camera app
2017 - OpenCV in the browser demonstrated here
awe.media

How does Computer Vision 
work in the browser?
awe.media
camera -> gUM -> video -> canvas -> pixels -> vision algorithms

HTMLVideoElement
This is a container for decoding and presenting video streams.
This brought plugin free video to the web.
awe.media

awe.media
Canvas, WebGL & the ArrayBuffer
The 2D Canvas gave us the ability to convert a video stream into pixel data.
WebGL brought 3D Canvases with access to the GPU.
But most importantly WebGL gave us ArrayBuffers 
which allowed us to access the pixel data for the ﬁrst time.

awe.media
JSARToolkit
In 2011 Billinghurst & Kato's ARToolkit was ported to Javascript.

awe.media
Enter WebRTC's getUserMedia()
Some claim this has a latency that makes the web unusable for AR. 
But here’s the numbers running on a Pixel - the max difference is ~200ms
200-250ms - Camera stream in a native AR
350-400ms - gUM stream in a web app

awe.media
WebRTC's getUserMedia()
FAST feature detection & Tigerstail in 2012

awe.media
Tracking.js released in 2012

awe.media
AR.js released in 2017

awe.media
Transpiling OpenCV
This brings a more general computer vision toolkit to the web!

awe.media
But there's no gUM on iOS?
For Vision based functionality we fallback to Visual Search
For Location based apps we fallback to 360°/VR (like Pokemon Go with the camera off)
And remember “video see thu” is not the only form of AR

Computer Vision - now working in over 2 Billion Web Browsers!

More Related Content

What's hot

Similar to Computer Vision - now working in over 2 Billion Web Browsers!

More from Rob Manson

Recently uploaded