1. AI NEXTCon Silicon Valley 18
SANTA CLARA | APRIL 10-13
#ainextcon
Adrian Kaehler, CEO Giant.AI
2. } Author: Learning OpenCV and Learning OpenCV 3
} Winner 2005 DARPA Grand Challenge – wrote
vision system
} Vice President and head of Robotics and Machine
Learning at Applied Minds for 8 years, built
autonomous vehicles for JIEDDO for the Iraq and
Afghanistan wars.
} Magic Leap
} Founder: Silicon Valley Deep Learning Group
} Founder and CEO: Giant.AI
4/13/2018 2
3. } Introduction
} Section I: Turning point, DARPA challenges, and old approaches
} Section II: From then until now, important events since
} Wrap Up: What’s now and what’s coming.
4/13/2018 3Adrian Kaehler
5. } 1991: USDOT mandate to "demonstrate an automated vehicle and highway system by 1997”
} 1994: Daimler-Benz and Ernst Dickmanns of UniBwM drove more than 1,000 km in Paris demonstrating
autonomous driving in free lanes, convoy driving, and lane changes with autonomous passing of other
cars
} 1995: CMU Navlab project completed a 5,000 km cross-country journey, of which 98.2% was
autonomously controlled. NavLab projects heavily used computer vision. In one case, steering was a
neural network controlled using video images.
} 1995: Dichmanns demonstrates the use of saccadic computer vision for control.
} 1996: ARGO project uses stereo vision to find lane markings and drives almost 2000km.
} 1997: “Demo 97” in San Diego major carmakers showed WHAT DID THEY SHOW.
} 1999: First driverless transportation of human passengers by the Netherlands’ ParkShuttle project using
embedded magnets (not computer vision).
4/13/2018 5
During this period, LIDAR systems were extremely
expensive and did not play a large role in
autonomous driving projects. The general belief
among researchers was that cameras and
computer vision techniques would solve this
problem.
Adrian Kaehler
6. 4/13/2018 6
• 1985 – ALV: Road Following Demonstration: Vehicle traverses a 2 km at 10 km/hr.
Forward motion only, with no obstacle avoidance required.
• 1986 - ALV: Obstacle Avoidance Demonstration: Vehicle traverses 5 km road
course at speeds up to 20 km/hr; must recognize and maneuver to avoid fixed
objects that are small with respect to road width.
• 1987 – ALV: Cross-country Route Planning Demonstration: Vehicle plans and
executes a 5 km traverse of open desert terrain. Demonstrates soil and ground
cover typing.
• 1988 - ALV: Road Network Route Planning and Obstacle Avoidance Demonstration:
Vehicle plans and executes a 20 km point-to-point traverse through a road using
landmarks as navigation aids. Demonstration includes map updating and off-road
maneuvering to avoid obstacles.
• 1989 - ALVINN: a 3-layer back-propagation network is designed for the task of road
following. ALVINN takes images from a camera and a laser range finder and
determines the direction the vehicle should travel in order to follow the road. The
network can effectively follow real roads under certain field conditions.
• 1992 - Demo I (Army): Though the emphasis of this program was teleoperation, it
used a stereo camera system for obstacle detection and safe vehicle stopping as
part of an "enhanced teleoperation" focus.
• 1998 - Demo II (DARPA): Introduced high resolution stereo vision as a means of
obstacle detection.
• 2001 - Demo III (Army) Demonstrated the ability of unmanned ground vehicles to
navigate miles of difficult off-road terrain, avoiding obstacles such as rocks and
trees.
9. By using computer vision, Stanley was able to extrapolate, from the very small patch
of road visible to its LIDAR sensors, far into the distance as a means of
comprehending the road ahead.
However, this was the only function vision was used for in the winning vehicle.
4/13/2018 9Adrian Kaehler
10. The TerraMax vehicle from Oshkosh Defense used LIDAR and Stereo Vision to detect
positive and negative obstacles.
4/13/2018 10Adrian Kaehler
11. Vehicles needed to demonstrate parking, lane following, rule adherence,
and interaction with other vehicles. Vision becomes much more
important for these tasks and appears widely in many vehicles.
4/13/2018 11Adrian Kaehler
12. Note that this is just a sampling of the many participants in the DARPA Urban Challenge.
4/13/2018 12
Vehicle KNIGHT RIDER CAROLINE TERRAMAX JUNIOR BOSS
Team UCF Braunschweig Oshkosh Stanford CMU
Lane Detection
Road Shape
Estimation
Drivability
Obstacles
Adrian Kaehler
13. Founded in 1999, MobilEye has focused on vision
based Advanced Driver Assistance Systems (ADAS)
Their technology was used by the CMU team in the
DARPA Urban Challenge
Early systems provided:
◦ Lane detection
◦ Lane departure warning
◦ Obstacle/Pedestrian detection
◦ Vehicle distance measurements
By 2007, MobilEye systems were for sale in
commercial vehicles including the Cadillac STS and
DTS.
There is much more to say about MobilEye in the current time. They are developing much
broader scope solutions that use vision to provide breaking, sign, and traffic light
detection. Their planned EyeQ-5 chip may do much more (e.g. path planning).
4/13/2018 13Adrian Kaehler
14. } In 2009 Google launched what is now
Waymo, their self driving car project,
bringing together many key players from the
DARPA challenges.
} Stanford’s distrust of computer vision was
clearly inherited by the Google vehicles, but
still vision was unavoidable for many tasks.
} In this era LIDAR continued to dominate
mainstream thought about vehicle
autonomy.
} We will see, however, that other players
would not be so hesitant to use computer
vision.
} Many saw LIDAR as an insurmountable cost
barrier for wide adoption of vehicle
autonomy.
4/13/2018 14Adrian Kaehler
15. In the years following the DARPA challenges, many researchers
worked to integrate the learnings from the many challenge
teams and to put many of the techniques developed for those
races onto a firmer theoretical footing.
4/13/2018 15
In computer vision, important algorithms such as the Dalal and Triggs “HOG”
pedestrian detector (2005) were refined and evolved and new techniques were
developed, such as the use of deformable parts models to reliably find objects
such as bicyclists (2011).
Adrian Kaehler
16. } In 2012, the KITTI dataset was
released.
} GPS RTK INS, Stereo Cameras (mono
and color), Velodyne LIDAR.
} In this system, the INS and LIDAR are
used to create a ground truth dataset
that can be used to evaluate the
performance of vision algorithms.
Professional annotator assigned
bounding boxes and classes to
important objects (cars, pedestrians,
bicycles, etc.)
} For development of: Stereo, Optical
Flow, 3D reconstruction, 3D object
detection, and 3d Object Tracking.
4/13/2018 16Adrian Kaehler
17. } The appearance of such datasets as KITTI allow new algorithms to be developed
faster, to be compared to one another more efficiently, and for more players to
work on the problem.
4/13/2018 17Adrian Kaehler
19. } In 2012 “AlexNet” put Neural Networks
back on the map by decisively defeating
all other approaches on an important
computer vision benchmark task.
} More importantly, this approach, now
call Deep Neural Networks, opened the
door to solving a wide variety of
problems wit computer vision that were
previously considered unreachable.
} Suddenly the economics of vision
become extremely compelling. A
camera can cost as little as $1, while a
Velodyne HD-64 costs nearly $70,000.
4/13/2018 19Adrian Kaehler
20. The computer vision and machine learning
communities immediately begin exploring
the limits of this technology, many new
things are found to be possible.
Suddenly, results thought to be
generations away begin to occur on a
regular basis.
◦ Computers beat Go masters
◦ Voice recognition starts to really work
◦ Chatbots passing the Turing Test
◦ Soon robots will be folding you laundry!
We will look at just a few interesting
results...
4/13/2018 20
21. The original SegNet was used for
Segmentation
A Deep Neural Network (DNN) learns a latent
representation for a scene that it then
interprets into semantic pixel labeling.
Later, similar structures were used to learn
such things as color for black-and-white
movies and depth images from monocular
video.
4/13/2018 21Adrian Kaehler
23. YOLO finds objects in an image and predicts bounding
boxes for them while simultaneously classifying them.
The system uses a combination of DNNs and other
more traditional probabilistic techniques.
The result is a high performance extremely fast system
for detecting a wide variety of different classes of
interest. The latest version, YOLO-9k, can find nine
thousand different classes of objects.
This is another example of a result that only a few
years ago would have seemed decades away.
4/13/2018 23Adrian Kaehler
25. And there is no end in sight. Deep Neural
Network approaches to problems keep
giving better and better results, and
nobody can say where it will stop.
Mask R-CNN is in improvement on the
prior algorithms as it combines object
finding with pixel level labeling for each
object type.
Mark R-CNN can also be used for human
pose estimation. We’ll see in a moment
what that has to do with self driving cars.
4/13/2018 25Adrian Kaehler
27. 4/13/2018 27
Supervised Autonomy Full Autonomy
Traditional
Approaches with Deep
Learning Components
Fully Learned Systems
that Map Vision
Directly to Control
Adrian Kaehler
28. The overall problem of a self driving car is not
solved, but many important components are “solved
enough” to deploy into the market.
Some are now working to further improve those
same components, while others are looking at
additional ways the available technology can benefit
the market.
Here are some new directions to watch:
Using vision to
◦ make sure the driver is aware (glancing behaviors)
◦ make sure the driver is physically engaged (grasping
the steering wheel)
◦ make sure the driver is satisfied with their experience
(emotion)
◦ Using vision to build maps and perform localization
(SLAM)
4/13/2018 28
Images from Lex Friedman, MIT
https://selfdrivingcars.mit.edu
Adrian Kaehler
29. Though there are many approaches to autonomy and how it will be applied in the
products of individual companies, some things seem certain:
} The future for computer vision is bright
} The future for autonomous cars is bright
} The future for computer vision in autonomous cars is very bright
My Predictions:
} LIDAR gets cheaper, but vision is getting better even faster.
} This does not mean that vision solves all problems, but it is fast on it’s way to
being the central sensor in autonomous vehicles.
4/13/2018 29