SlideShare a Scribd company logo
1 of 31
Download to read offline
Advancements in Deep Learning and AI:
Stepping inside Baidu’s “Brain”
	

Dr. Ren Wu	

Distinguished Scientist, Baidu	

wuren@baidu.com 	

@韧在百度
Baidu Stock
Baidu Q2’14
万物互联	
 万物智能	
大数据时代
人
互联网
索引知识	
⼈人机交互	
物理世界
传感器,	
3D建模	
百度大脑
	
  	
  Deep	
  Learning	
  Pla-orm	
  
高性能计算
搜索,
⼲⼴广告,	
预测,决策	
智能硬件,	
机器⼈人, 	
⾃自动驾驶	
探索发现,	
3D打印	
信息感知 思考, 学习 决策, 行动, 创造
全景图:基于百度大脑的人工智能
百度⼤大脑	

无时不刻
在学习和演进
百亿级参数
构建世界上最大规
模深度神经网络
世界领先的
深度学习算法:
语音识别,图像识别
,自然语言理解,广
告精准匹配,用户
建模,
…
Deep Image:
Scaling up Image Recognition
ImageNet Classification Challenge
ImageNet classification results 	

Team	

 Year	

 Place	

 Error (top-5)	

 Uses external
data	

SuperVision	

 2012 	

 -	

 16.4%	

 no	

SuperVision	

 2012 	

 1st	

 15.3%	

 ImageNet 22k	

Clarifai	

 2013	

 -	

 11.7%	

 no	

Clarifai	

 2013	

 1st	

 11.2%	

 ImageNet 22k	

MSRA	

 2014 	

 3rd	

 7.35%	

 no	

VGG	

 2014	

 2nd	

 7.32%	

 no	

GoogLeNet	

 2014	

 1st	

 6.67%	

 no	

Slide credit: Yangqing Jia, Google	

 Invincible ?
Our approach – Insights and inspirations	
多算胜少算不胜	

	

孙⼦子 计篇	

	

More calculations win, few
calculation lose	

元元本本殚⻅见洽闻	

	

班固 ⻄西都赋 	

	

Meaning the more you see the
more you know	

明⾜足以察秋毫之末	

	

孟⼦子 梁惠⺩王上	

	

ability to see very fine details
Project Minwa – 百度敏娲	

•  Minerva + Athena + ⼥女娲	

•  Athena: Goddess of Wisdom,Warfare,
Divine Intelligence,Architecture, and Crafts	

•  Minerva: Goddess of wisdom, magic,
medicine, arts, commerce and defense	

•  ⼥女娲: 抟⼟土造⼈人, 炼⽯石补天, 婚姻, 乐器	

	

World’s Largest Artificial Neural Networks	

	

v Pushing the State-of-the-Art	

v ~ 100x bigger than google brain	

v New kind of Intelligence?
Hardware/Software Co-design	
•  Stochastic gradient decent (SGD)	

•  High compute density	

•  Scale up, up to 100 nodes	

•  High bandwidth low latency	

•  36 nodes, 144 GPUs, 6.9TB Host, 1.7TB Device	

•  0.6 PFLOPS 	

•  Highly Optimized software stack	

•  RDMA/GPU Direct 	

•  New data partition and communication
strategies	

GPUs	
Infiniband
Speedup ( wall time for convergence ) 	
Validation set accuracy for different numbers of GPUs	
0	
  
0.1	
  
0.2	
  
0.3	
  
0.4	
  
0.5	
  
0.6	
  
0.7	
  
0.8	
  
0.9	
  
0.25	
   0.5	
   1	
   2	
   4	
   8	
   16	
   32	
   64	
   128	
   256	
  
Accuracy
Time (hours)
32 GPU
16 GPU
1 GPU
Accuracy 80%
32 GPU: 8.6 hours
1 GPU: 212 hours
Speedup: 24.7x
Never have enough training
examples!	

	

Key observations 	

•  Invariant to illuminant of the scene	

•  Invariant to observers 	

Augmentation approaches	

•  Color casting	

•  Optical distortion	

•  Rotation and cropping etc	

Data Augmentation	

“⻅见多识⼲⼴广”
Data Augmentation	

Augmentation The number of possible changes
Color casting 68920
Vignetting 1960
Lens distortion 260
Rotation 20
Flipping 2
Cropping 82944(crop size is 224x224, input image
size is 512x512)
Possible variations
The Deep Image system learned from ~2 billion examples, out
of 90 billion possible candidates.
Data augmentation vs. Overfitting
Multi-scale training	

•  Same crop size, different
resolution	

•  Fixed-size 224*224	

•  Downsized training images	

•  Reduces computational costs	

•  But not for state-of-the-art	

•  Different models trained by
different image sizes	

256*256	
512*512	
•  High-resolution model works	

•  256x256: top-5 7.96%	

•  512x512: top-5 7.42% 	

•  Multi-scale models are
complementary	

•  Fused model: 6.97%	

“明查秋毫”
Multi-scale training	

Tricycle	
Washer	
Backpack	
Little blue heron
Model	

•  One basic configuration has 16 layers	

•  The number of weights in our configuration is 212.7M	

•  About 40% bigger than VGG’s	

Team Top-1 val. error Top-5 val. error
GoogLeNet - 7.89%
VGG 25.9% 8.0%
Deep Image 24.88% 7.42%
Compare to state-of-the-art	

Deep Image has set the new record of 5.98% top-5 error rate for test dataset, a
10.2% relative improvement than the previous best result.	
Team Year Place Top-5 test error
SuperVision 2012 1 16.42%
ISI 2012 2 26.17%
VGG 2012 3 26.98%
Clarifai 2013 1 11.74%
NUS 2013 2 12.95%
ZF 2013 3 13.51%
GoogLeNet 2014 1 6.66%
VGG 2014 2 7.32%
MSRA 2014 3 8.06%
Andrew Howard 2014 4 8.11%
DeeperVision 2014 5 9.51%
Deep Image - -
5.98%
Almost human performance	

4.00%	

4.50%	

5.00%	

5.50%	

6.00%	

6.50%	

7.00%	

GoogLeNet	

 Deep Image	

 Human	

Closing the gap by half
Robustness
Classification failure cases	

Groundtruth: Police car	

GoogLeNet:	

●  laptop	

●  hair drier	

●  binocular	

●  ATM machine	

●  seat belt	

	

	

Slide credit: Yangqing Jia, Google
Major differentiators 	

•  Customized built supercomputer dedicated for DL	

•  Simple, scalable algorithm + Fully optimized
software stack	

•  Larger models	

•  More Aggressive data augmentation	

•  Multi-scale, include high-resolution images	

Brute force + Insights and push for extreme
Thank you!

More Related Content

Similar to DeepImage_EmTech-public-small

"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr..."Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...Edge AI and Vision Alliance
 
"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures
"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures
"The Vision AI Start-ups That Matter Most," a Presentation from Cognite VenturesEdge AI and Vision Alliance
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningAli Alkan
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision SlidesIntel® Software
 
Infopulse AI, Data Science & RPA Managed Services
Infopulse AI, Data Science & RPA Managed ServicesInfopulse AI, Data Science & RPA Managed Services
Infopulse AI, Data Science & RPA Managed ServicesInfopulse
 
Theo Gevers (3DUniversum.com) @ Thingscon Amsterdam
Theo Gevers (3DUniversum.com) @ Thingscon AmsterdamTheo Gevers (3DUniversum.com) @ Thingscon Amsterdam
Theo Gevers (3DUniversum.com) @ Thingscon AmsterdamCLICKNL
 
Data Management is a Team Sport - IBM
Data Management is a Team Sport - IBMData Management is a Team Sport - IBM
Data Management is a Team Sport - IBMMongoDB
 
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...Databricks
 
Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.Shashank Garg
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfQualcomm Research
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Turi, Inc.
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...Dozie Agbo
 
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)byteLAKE
 
Counting the World with AI Models
Counting the World with AI ModelsCounting the World with AI Models
Counting the World with AI ModelsGanes Kesari
 

Similar to DeepImage_EmTech-public-small (20)

"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr..."Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
 
"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures
"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures
"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures
 
ICS1020 CV
ICS1020 CVICS1020 CV
ICS1020 CV
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Infopulse AI, Data Science & RPA Managed Services
Infopulse AI, Data Science & RPA Managed ServicesInfopulse AI, Data Science & RPA Managed Services
Infopulse AI, Data Science & RPA Managed Services
 
Theo Gevers (3DUniversum.com) @ Thingscon Amsterdam
Theo Gevers (3DUniversum.com) @ Thingscon AmsterdamTheo Gevers (3DUniversum.com) @ Thingscon Amsterdam
Theo Gevers (3DUniversum.com) @ Thingscon Amsterdam
 
Data Management is a Team Sport - IBM
Data Management is a Team Sport - IBMData Management is a Team Sport - IBM
Data Management is a Team Sport - IBM
 
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
 
Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdf
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
Hung DO-DUY - Spikenet
Hung DO-DUY - Spikenet Hung DO-DUY - Spikenet
Hung DO-DUY - Spikenet
 
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
 
ICS1020CV_2022.pdf
ICS1020CV_2022.pdfICS1020CV_2022.pdf
ICS1020CV_2022.pdf
 
Counting the World with AI Models
Counting the World with AI ModelsCounting the World with AI Models
Counting the World with AI Models
 
demo AI ML.pptx
demo AI ML.pptxdemo AI ML.pptx
demo AI ML.pptx
 

DeepImage_EmTech-public-small

  • 1. Advancements in Deep Learning and AI: Stepping inside Baidu’s “Brain” Dr. Ren Wu Distinguished Scientist, Baidu wuren@baidu.com @韧在百度
  • 5. 人 互联网 索引知识 ⼈人机交互 物理世界 传感器, 3D建模 百度大脑    Deep  Learning  Pla-orm   高性能计算 搜索, ⼲⼴广告, 预测,决策 智能硬件, 机器⼈人, ⾃自动驾驶 探索发现, 3D打印 信息感知 思考, 学习 决策, 行动, 创造 全景图:基于百度大脑的人工智能
  • 7. Deep Image: Scaling up Image Recognition
  • 8.
  • 10. ImageNet classification results Team Year Place Error (top-5) Uses external data SuperVision 2012 - 16.4% no SuperVision 2012 1st 15.3% ImageNet 22k Clarifai 2013 - 11.7% no Clarifai 2013 1st 11.2% ImageNet 22k MSRA 2014 3rd 7.35% no VGG 2014 2nd 7.32% no GoogLeNet 2014 1st 6.67% no Slide credit: Yangqing Jia, Google Invincible ?
  • 11. Our approach – Insights and inspirations 多算胜少算不胜 孙⼦子 计篇 More calculations win, few calculation lose 元元本本殚⻅见洽闻 班固 ⻄西都赋 Meaning the more you see the more you know 明⾜足以察秋毫之末 孟⼦子 梁惠⺩王上 ability to see very fine details
  • 12. Project Minwa – 百度敏娲 •  Minerva + Athena + ⼥女娲 •  Athena: Goddess of Wisdom,Warfare, Divine Intelligence,Architecture, and Crafts •  Minerva: Goddess of wisdom, magic, medicine, arts, commerce and defense •  ⼥女娲: 抟⼟土造⼈人, 炼⽯石补天, 婚姻, 乐器 World’s Largest Artificial Neural Networks v Pushing the State-of-the-Art v ~ 100x bigger than google brain v New kind of Intelligence?
  • 13. Hardware/Software Co-design •  Stochastic gradient decent (SGD) •  High compute density •  Scale up, up to 100 nodes •  High bandwidth low latency •  36 nodes, 144 GPUs, 6.9TB Host, 1.7TB Device •  0.6 PFLOPS •  Highly Optimized software stack •  RDMA/GPU Direct •  New data partition and communication strategies GPUs Infiniband
  • 14. Speedup ( wall time for convergence ) Validation set accuracy for different numbers of GPUs 0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   0.25   0.5   1   2   4   8   16   32   64   128   256   Accuracy Time (hours) 32 GPU 16 GPU 1 GPU Accuracy 80% 32 GPU: 8.6 hours 1 GPU: 212 hours Speedup: 24.7x
  • 15. Never have enough training examples! Key observations •  Invariant to illuminant of the scene •  Invariant to observers Augmentation approaches •  Color casting •  Optical distortion •  Rotation and cropping etc Data Augmentation “⻅见多识⼲⼴广”
  • 16. Data Augmentation Augmentation The number of possible changes Color casting 68920 Vignetting 1960 Lens distortion 260 Rotation 20 Flipping 2 Cropping 82944(crop size is 224x224, input image size is 512x512) Possible variations The Deep Image system learned from ~2 billion examples, out of 90 billion possible candidates.
  • 17. Data augmentation vs. Overfitting
  • 18. Multi-scale training •  Same crop size, different resolution •  Fixed-size 224*224 •  Downsized training images •  Reduces computational costs •  But not for state-of-the-art •  Different models trained by different image sizes 256*256 512*512 •  High-resolution model works •  256x256: top-5 7.96% •  512x512: top-5 7.42% •  Multi-scale models are complementary •  Fused model: 6.97% “明查秋毫”
  • 20. Model •  One basic configuration has 16 layers •  The number of weights in our configuration is 212.7M •  About 40% bigger than VGG’s Team Top-1 val. error Top-5 val. error GoogLeNet - 7.89% VGG 25.9% 8.0% Deep Image 24.88% 7.42%
  • 21. Compare to state-of-the-art Deep Image has set the new record of 5.98% top-5 error rate for test dataset, a 10.2% relative improvement than the previous best result. Team Year Place Top-5 test error SuperVision 2012 1 16.42% ISI 2012 2 26.17% VGG 2012 3 26.98% Clarifai 2013 1 11.74% NUS 2013 2 12.95% ZF 2013 3 13.51% GoogLeNet 2014 1 6.66% VGG 2014 2 7.32% MSRA 2014 3 8.06% Andrew Howard 2014 4 8.11% DeeperVision 2014 5 9.51% Deep Image - - 5.98%
  • 24. Classification failure cases Groundtruth: Police car GoogLeNet: ●  laptop ●  hair drier ●  binocular ●  ATM machine ●  seat belt Slide credit: Yangqing Jia, Google
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Major differentiators •  Customized built supercomputer dedicated for DL •  Simple, scalable algorithm + Fully optimized software stack •  Larger models •  More Aggressive data augmentation •  Multi-scale, include high-resolution images Brute force + Insights and push for extreme