Intel Lake Crest
Yutaka Yasuda, Kyoto Sangyo University, 2016/12/16
AI
2016.3 AlphaGO vs
2016.9 Google (AI)
2015 Google Photo
“Google's AlphaGo AI Continues to Wallop Expert Human Go Player”, Popular Mechanics, 2016/3/10

http://www.popularmechanics.com/technology/a19863/googles-alphago-ai-wins-second-game-go/
Deep Learning
2014 ImageNet Google 20
2012 Google


”Deep Visual-Semantic
Alignments for
Generating Image
Descriptions”,
Andrej Karpathy, Li Fei-Fei,
Stanford University, CVPR 2015
Neural Network = Neuron
https://en.wikipedia.org/wiki/Artificial_neural_network
“Introduction to multi gpu deep learning with DIGITS 2”, Mike Wang

http://www.slideshare.net/papisdotio/introduction-to-multi-gpu-deep-learning-with-digits-2-mike-wang/6
“Introduction to multi gpu deep learning with DIGITS 2”, Mike Wang

http://www.slideshare.net/papisdotio/introduction-to-multi-gpu-deep-learning-with-digits-2-mike-wang/6
“Introduction to multi gpu deep learning with DIGITS 2”, Mike Wang

http://www.slideshare.net/papisdotio/introduction-to-multi-gpu-deep-learning-with-digits-2-mike-wang/6
https://www.youtube.com/watch?v=BMEffRAvnk4
Why nVIDIA?
Lake Crest
Intel Artificial Intelligence Day
2016/11/17 -12:30 PM PT San Francisco
http://pc.watch.impress.co.jp/docs/column/ubiq/1030981.html
Intel Nervana Engine
https://www.nervanasys.com/technology/engine/
ASIC
CPU ASIC GPU ASIC
Wikipedia
“ ASIC ”
Nervana Engine Web
2.5D
Blazingly fast data
access via
high-bandwith
memory (HBM)
Processing Cluster x12 (3x4)

ICL (Inter Chip Link) x12
8GB HBM2 x4
HBM?
An Introduction to HBM - High Bandwidth Memory - 

Stacked Memory and The Interposer 

http://www.guru3d.com/articles-pages/an-introduction-to-hbm-high-bandwidth-memory,2.html
• HBM
DRAM
•
• GPU
Interposer
• 2.5D
GDDR5 HBM2
32-bit Bus With 1024-bit
Up-to 1750 MHz (7 Gbps) 2 Gbps
Up-to 28 GB/s per chip 125GB/s (2Tb/s)
per unit
1.5V 1.3V
LGA 2011: CPU 2011
Xeon E5 1600/2600 v4 Broadwell-EP
2000 1024 x4
→ Wikipedia: LGA 2011
http://pc.watch.impress.co.jp/docs/column/ubiq/1030981.html
Tensor




https://www.tensorflow.org
“TensorFlow: Large-Scale Machine Learning on
Heterogeneous Distributed Systems”, Abdai, et. al, 2015, 

https://arxiv.org/abs/1603.04467v2
https://www.tensorflow.org/tutorials/mnist/beginners/
https://en.wikipedia.org/wiki/
Artificial_neural_network
or CPU
Nervana Engine
ASIC
Tensor
HBM2 4 unit
HBM 1024bit!
2.5D
Nervana Engine
12
100Gbit/s


https://www.nervanasys.com/technology/engine/
100Gbit/s *12
Deep Learning GPU
GPU
GPU SIMD
“ ”




http://logmi.jp/45705
GPU SIMD
GPU 32bit


AI GPU
nVIDIA CPU
https://www.tensorflow.org/tutorials/mnist/beginners/
GPU
Nervana Engine
Binary Neural Network
GPU 32bit
BNN - Binarized Neural
Network ( -1 / +1 )
Nervana
Accelerating Neural Networks with Binary Arithmetic

https://www.nervanasys.com/accelerating-neural-networks-binary-arithmetic/
“Accelerating Neural Networks with Binary Arithmetic”
(blog post)
These 32 bit floating point multiplications, however, are very expensive.
In BNNs, floating point multiplications are supplanted with 

bitwise XNORs and left and right bit shifts.
This is extremely attractive from a hardware perspective:
binary operations can be implemented computationally efficiently at a low
power cost.
Nervana website (blog post)

https://www.nervanasys.com/accelerating-neural-networks-binary-arithmetic/
32bit
BNN XNOR bit shift
Nervana Engine
GPU SIMD
BNN (ASIC)
XNOR 

-1 0, +1 1
Tensor
GPU nVIDIA
Intel Xeon Phi 

http://www.4gamer.net/games/049/G004963/20161007061/
Intel Nervana Engine
https://software.intel.com/en-us/blogs/2013/avx-512-instructions
Deep Learning
nVIDIA GPU
Deep Learning
Nervana Binalized HBM2
nVIDIA FP16
Intel AVX-512 SIMD
Google TPU (Tensor Processing Unit) 8bit CPU!
Google




XNOR / 

CPU 

100Gbps


SIMD
'You've got to find what you love,' Jobs says

Steve Jobs, 2005, Stanford University

https://www.youtube.com/watch?v=UF8uR6Z6KLc
“Follow your heart”
Lake Crest について調べてみた

Lake Crest について調べてみた