DeepImage_EmTech-public-small

Advancements in Deep Learning and AI:
Stepping inside Baidu’s “Brain”

Dr. Ren Wu

Distinguished Scientist, Baidu

wuren@baidu.com

@韧在百度

万物互联
万物智能
大数据时代

人
互联网
索引知识
⼈人机交互
物理世界
传感器,
3D建模
百度大脑

Deep
Learning
Pla-orm

高性能计算
搜索，
⼲⼴广告，
预测，决策
智能硬件,
机器⼈人,
⾃自动驾驶
探索发现，
3D打印
信息感知思考, 学习决策, 行动, 创造
全景图：基于百度大脑的人工智能

百度⼤大脑

无时不刻
在学习和演进
百亿级参数
构建世界上最大规
模深度神经网络
世界领先的
深度学习算法：
语音识别，图像识别
，自然语言理解，广
告精准匹配，用户
建模，
…

Deep Image:
Scaling up Image Recognition

ImageNet Classiﬁcation Challenge

ImageNet classiﬁcation results

Team

Year

Place

Error (top-5)

Uses external
data

SuperVision

2012

-

16.4%

no

SuperVision

2012

1st

15.3%

ImageNet 22k

Clarifai

2013

-

11.7%

no

Clarifai

2013

1st

11.2%

ImageNet 22k

MSRA

2014

3rd

7.35%

no

VGG

2014

2nd

7.32%

no

GoogLeNet

2014

1st

6.67%

no

Slide credit: Yangqing Jia, Google

Invincible ?

Our approach – Insights and inspirations
多算胜少算不胜

孙⼦子计篇

More calculations win, few
calculation lose

元元本本殚⻅见洽闻

班固⻄西都赋

Meaning the more you see the
more you know

明⾜足以察秋毫之末

孟⼦子梁惠⺩王上

ability to see very ﬁne details

Project Minwa – 百度敏娲

•  Minerva + Athena + ⼥女娲

•  Athena: Goddess of Wisdom,Warfare,
Divine Intelligence,Architecture, and Crafts

•  Minerva: Goddess of wisdom, magic,
medicine, arts, commerce and defense

•  ⼥女娲: 抟⼟土造⼈人, 炼⽯石补天, 婚姻, 乐器

World’s Largest Artiﬁcial Neural Networks

v Pushing the State-of-the-Art

v ~ 100x bigger than google brain

v New kind of Intelligence?

Hardware/Software Co-design
•  Stochastic gradient decent (SGD)

•  High compute density

•  Scale up, up to 100 nodes

•  High bandwidth low latency

•  36 nodes, 144 GPUs, 6.9TB Host, 1.7TB Device

•  0.6 PFLOPS

•  Highly Optimized software stack

•  RDMA/GPU Direct

•  New data partition and communication
strategies

GPUs
Inﬁniband

Speedup ( wall time for convergence )
Validation set accuracy for different numbers of GPUs
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.25
0.5
1
2
4
8
16
32
64
128
256

Accuracy
Time (hours)
32 GPU
16 GPU
1 GPU
Accuracy 80%
32 GPU: 8.6 hours
1 GPU: 212 hours
Speedup: 24.7x

Never have enough training
examples!

Key observations

•  Invariant to illuminant of the scene

•  Invariant to observers

Augmentation approaches

•  Color casting

•  Optical distortion

•  Rotation and cropping etc

Data Augmentation

“⻅见多识⼲⼴广”

Data Augmentation

Augmentation The number of possible changes
Color casting 68920
Vignetting 1960
Lens distortion 260
Rotation 20
Flipping 2
Cropping 82944(crop size is 224x224, input image
size is 512x512)
Possible variations
The Deep Image system learned from ~2 billion examples, out
of 90 billion possible candidates.

Data augmentation vs. Overﬁtting

Multi-scale training

•  Same crop size, different
resolution

•  Fixed-size 224*224

•  Downsized training images

•  Reduces computational costs

•  But not for state-of-the-art

•  Different models trained by
different image sizes

256*256
512*512
•  High-resolution model works

•  256x256: top-5 7.96%

•  512x512: top-5 7.42%

•  Multi-scale models are
complementary

•  Fused model: 6.97%

“明查秋毫”

Multi-scale training

Tricycle
Washer
Backpack
Little blue heron

Model

•  One basic conﬁguration has 16 layers

•  The number of weights in our conﬁguration is 212.7M

•  About 40% bigger than VGG’s

Team Top-1 val. error Top-5 val. error
GoogLeNet - 7.89%
VGG 25.9% 8.0%
Deep Image 24.88% 7.42%

Compare to state-of-the-art

Deep Image has set the new record of 5.98% top-5 error rate for test dataset, a
10.2% relative improvement than the previous best result.
Team Year Place Top-5 test error
SuperVision 2012 1 16.42%
ISI 2012 2 26.17%
VGG 2012 3 26.98%
Clarifai 2013 1 11.74%
NUS 2013 2 12.95%
ZF 2013 3 13.51%
GoogLeNet 2014 1 6.66%
VGG 2014 2 7.32%
MSRA 2014 3 8.06%
Andrew Howard 2014 4 8.11%
DeeperVision 2014 5 9.51%
Deep Image - -
5.98%

Almost human performance

4.00%

4.50%

5.00%

5.50%

6.00%

6.50%

7.00%

GoogLeNet

Deep Image

Human

Closing the gap by half

Classiﬁcation failure cases

Groundtruth: Police car

GoogLeNet:

●  laptop

●  hair drier

●  binocular

●  ATM machine

●  seat belt

Slide credit: Yangqing Jia, Google

Major differentiators

•  Customized built supercomputer dedicated for DL

•  Simple, scalable algorithm + Fully optimized
software stack

•  Larger models

•  More Aggressive data augmentation

•  Multi-scale, include high-resolution images

Brute force + Insights and push for extreme

DeepImage_EmTech-public-small

Recommended

Recommended

More Related Content

Similar to DeepImage_EmTech-public-small

Similar to DeepImage_EmTech-public-small (20)

DeepImage_EmTech-public-small