孫民/從電腦視覺看人工智慧 : 下一件大事

Ar#ﬁcial
Intelligence:

The
Next
Big
Thing

from
a
computer
vision
perspec0ve

VSLab

清大電機
孫民

What’s
the
Next
Big
Thing?

h2p://research.microso6.com/en-‐us/um/redmond/events/fs2015

Goal

“big
data
being
the
source,
machine

learning
being
the
technique,
and
AI

being
the
outcome”

by
Prof.
Hsuan-‐Tien
Lin
at
IEEE
BigData
2016

Many
kinds
of
source
(data)
and

outcomes
(AI
tasks)
can
be
trained
end-‐
to-‐end
using
Deep
Learning
(DL)

Classical
AI
Tests:
Turing
Test

by
Alan
Turing
in
1950

Chatbot@F8

h2ps://developers.facebook.com/videos/f8-‐2016/keynote/

Classical
AI
Tests:
CAPTCHA

Breaking
CAPTCHA

by
vicarious.com

AlphaGo

2016
by
Google
DeepMind

Are
these
what
AI
all
about?

2014
Subﬁelds
of
AI

2015

Ar#ﬁcal
General
Intelligence
(AGI)

Deep
Learning
(DL)

•  Data

•  GPU
Compu0ng

•  Talents

DL
Fuses
AI-‐subﬁelds

•  Vision
and
Language

•  Vision
and
Control

h2p://mscoco.org/

Atari
Breakout
game
&
AlphaGo,
DeepMind.

-‐>
AGI

•  Mul0ple
Encoding
and
Decoding

Image
Cap#oning

f(

)
=

The
man
at
bat
is

ready
to
swing

at
the
pitch

Vision
Language

Recurrent
Neuron
Network
(RNN)

credit:
Nature

convolu0ons

Convolu#on
Neuron
Network
(CNN)

credit:
wiki

Image
Ques#on
Answering

h2p://visualqa.org/

Zhen
et
al.
ECCV
2016
from
VSLab
and
Stanford
AI
Lab

Big
Video
Data
with
Titles

•  Pairs
of

Raw
Video

CNN
CNN
CNN
CNN

Title

Viral
Videos

Google
for
“viral
video
company”

Large
Video
Repository

Currently
28740
videos
and
keep
growing

Vision
and
Control

h2ps://gym.openai.com/

•  Learning
to
play
game
with
weak
supervision:

Reinforcement
Learning
(RL)

Where
It
All
Begins
…

by
DeepMind
in
NIPS
2013
Deep
Learning
Wrokshop

Playing Atari with
Deep Reinforcement Learning
slides
by

Yen-‐Chen
Lin

Control:
Learning
to
Act

Play
Breakout
equals
to

•  Input:
screen
images

•  Output:
ac0ons

(do
nothing
|
left
|
right)

Supervised

Classiﬁca0on

slides
by

Yen-‐Chen
Lin

Supervised
Solu#on

•  Training data:
Record
experts
game

sessions

•  Target label:
Ac0on
experts
take
at
every

step

•  What
if
there’s
no
expert?

•  This
is
not
how
human
learns

Problems:

slides
by

Yen-‐Chen
Lin

How
Human
Learns

•  Don’t
need
somebody
to
tell
us
a
million

0mes
which
move
to
choose
at
each
screen

•  Just
need
occasional feedback
that
we

did
the
right
thing

slides
by

Yen-‐Chen
Lin

Reinforcement
Learning

•  Somewhere
between
supervised
and

unsupervised
learning

•  Sparse
and
time-delayed
labels

Based
only
on
those
rewards,
the
agent
has

to
learn
to
behave
in
the
environment.

A
ra0onal
agent
should
op0mize
total

reward.

slides
by

Yen-‐Chen
Lin

RL
in
A
Nutshell

slides
by

Yen-‐Chen
Lin

Markov
Decision
Process

•  State

•  Action

•  Reward
The
probability
of
the
next
state
si+1
depends
only
on

current
state
si
and
ac0on
ai.
slides
by

Yen-‐Chen
Lin

Episode

One
episode
of
this
process
(e.g.
one
game)
forms
a

ﬁnite
sequence
of
states,
ac0ons
and
rewards:

slides
by

Yen-‐Chen
Lin

Example:
Breakout

•  State: game
screen

•  Action: 
 
 
•  Reward:
game
score

1. do
nothing

2.
le6

3.
right

slides
by

Yen-‐Chen
Lin

Example:
Breakout

•  State: successive  
game
screens

•  Action: 
 
 
•  Reward:
game
score

1. do
nothing

2.
le6

3.
right

slides
by

Yen-‐Chen
Lin

•  To
perform
well,
we
should
also
take
future

rewards
into
account,
how
to
do
that?

Total reward:
Total future reward:
Reward

slides
by

Yen-‐Chen
Lin

Discounted
Future
Reward

•  However,
since
the
environment
is

stochas0c,
intui0vely
one
should
earn

reward
as
soon
as
possible

Total discounted future reward:
slides
by

Yen-‐Chen
Lin

Q
func#on

•  Q(s, a):
The
maximum discounted future reward

when
we
perform
ac0on
a
in
state
s,

and
con0nue
optimally
from
that
point
on.

It represents the “quality” of a certain action in a given state.
slides
by

Yen-‐Chen
Lin

How
to
Choose
Ac#on?

Here
π
represents
the
policy,

the
rule
how
we
choose
an
ac0on
in
each
state.

If
we
know
Q
func0on,

slides
by

Yen-‐Chen
Lin

Q
Func#on
Implementa#on

ac#on
0
ac#on
1
ac#on
2

state
0
-‐2
-‐1
5

state
1
3
2
3

state
2
5
6
-‐6

slides
by

Yen-‐Chen
Lin

If
We
Use
Pixels
as
State

1.  Resize
images
to
84x84

2.  Convert
to
grayscale
with
256
levels

3.  Use
last
4
frames
to
represent
state

25684x84x4
=
1067970

possible
game
states

We
can
never
cover
all
the
cases!

slides
by

Yen-‐Chen
Lin

Vision
&
Controal:
Deep
Q
Network

We
use
CNN
to
represent
Q
func0on,
which
takes:

•  Input:
the
state
(4
game
screens)
and
ac0on

•  Output:
Q-‐values
of
diﬀerent
ac0ons
a
(i.e.,
Q(s,a))

slides
by

Yen-‐Chen
Lin

π(

)=argmaxaQ(

,a)

Fusing
Mul#ple
Sensors

Ke#le%
Medium+wrap%
Ke#le%
Medium+wrap%
thumb+4+ﬁnger%
Manipula7on%
Region%
Side+view%
Chan
et
al.
ECCV
2015
from
VSLab

Left Hand Head Right Hand 81
Lab
Office
Home

Left Hand Head Right Hand 82
Lab
Office
Home

Recogni#on
from
Wearable
Cameras

Pred%
GT%
Pred%
GT%
Gesture%Recogni1on%
Object%Category%Recogni1on%

Real-‐#me
Wearable
Demo

Fisheye
camera
NVIDIA
TK1

Real-‐#me
Wearable
Demo

cellphone,
bo2le,
keyboard,
mouse,
free
hand

Take-‐Home
Message

•  Encoding
Source
(data)

– N-‐D
observa0on

– N-‐D
sequence
of
observa0ons

•  Decoding
Outcome
(AI
tasks)

– N-‐D
single
output

– N-‐D
open-‐ended
sequence
as
output

•  Mul0ple
Encoding
and
Decoding

•  If
each
module
is
differen0able/approximately

differen0able
-‐>
End-‐to-‐End
Learning

We
get
many
tools
to
tackle

Ar#ficial
General
Intelligence

Just
Try!

Worse
Thing:
Do
Nothing

My
Two
Cents
for
Taiwan

Ques#ons

•  Can
I
simply
ask
my
engineers
to
use

open
source
deep
learning
tools
to

create
new
products?

Answer:
Yes
and
Not
really.

Yes
–
if
you
want
to
complete
a
well-‐known

task.
But
Google’s
MLaaS
product
will
almost

always
beat
you.

Not
really
–
if
you
want
to
solve
your
own

problem,
with
your
own
data.
You
need
talents

or
make
engineers
not
afraid
of
failure.

Where
can
I
ﬁnd
talents?

•  Most
talents
are
PhD
students
or
young

professionals
in
the
US
and
EU.

h2p://www.economist.com/news/business/21695908-‐silicon-‐valley-‐ﬁghts-‐talent-‐universi0es-‐struggle-‐hold-‐their

How
can
we
compete?

Local
Students

•  Our
students
know
deep
learning
is
HOT!

[
Deep
Learning
Workshop
中研院
]
500
位參加者

Case
Study:
NTHU@TW
Undergraduate

h2ps://github.com/yenchenlin1994/DeepLearningFlappyBird

Case
Study:
UNIST@Korean
Undergraduate

To-‐Do
for
Local
Students

•  We
need
more
students
to
work
on

– realis0c
deep
learning
projects
with

– enough
computer
resource

•  We
need
some
of
them
to
stay
in
our
local

industry

Advanced
Deep
Learning
Course
at
NTHU
(105學年)

1.  Taught
by
a
group
of
profs

2.  Topics
including
latest
DNN
models,
distributed

training,
DL
for
embedded
system

3.  Sponsored
by
MTK
and
ITRI
巨資中心

4.  More
sponsors
are
welcomed!

For
Talents
Abroad

Get
in
the
Talents
Race!

h2p://cvpr2016.thecvf.com/exhibit/industry_expo

For
Talents
Abroad

Most
of
them
fresh
PhDs

1
Billion
Pledged
USD

AI
is
happening
Fast

孫民/從電腦視覺看人工智慧 : 下一件大事

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 孫民/從電腦視覺看人工智慧 : 下一件大事

Similar to 孫民/從電腦視覺看人工智慧 : 下一件大事 (20)

More from 台灣資料科學年會

More from 台灣資料科學年會 (20)

Recently uploaded

Recently uploaded (20)

孫民/從電腦視覺看人工智慧 : 下一件大事