Computational
graphs
Presenters -
Mohamed Aboeleinen
A H M Forhadul Islam
Outline
➔ Computational Graphs
◆ How it works?
◆ Forward and Backward Propagation
➔ TensorFlow as an example
◆ History
◆ Features
◆ Parallelism
◆ Uses and Case Study
➔ CNTK and other tools
◆ Comparison
◆ TensorFlow on Spark
1.
Computational
Graphs
Representation of a composite function
as a network of connected nodes
How it works?
Nodes
They can a math operation or persistent
data.
Flow of data
The Output of each step is an input to the
next.
Edges
They represent the flow of data in the
form multidimensional array.
Forward Propagation and Backward Propagation
Computational Graphs can be used to
show flow of computation, information
always moves one direction; it never
goes backwards.
It could be used to:
● Mimic Brain Activity.
● Model the behaviour of different
models: Social Networks,
Network of Roads.
Backward Propagation
Error Function is propagated backwards
adjusting the weights and biases to
minimize the cost, learning process. (the
fundamental concept of Machine
Learning)
2.
TensorFlow
Numerical computation using data flow
graphs
History
2011
Nov.,
2015
Andrew Ng of Stanford alongside Jeff Dean and Greg
Corrado of Google started to build a large-scale deep
learning software system “DistBelief”
Google releases an open source version of
Tensor Flow.
May,
2016
Google announces its tensor flow
tailored ASIC (TPU).
And now this!
https://youtu.be/-F-TQJtbFMs
TF!
Main features
No change over platforms For
example: phones and tablets up
to large-scale distributed and
thousands of computational
devices such as GPU cards.
Flexible, can be used for a lot of
algorithms
Used in a lot of applications: NLP,
Robotics, Drug Discovery, Speech
Recognition
ٍٍSuports parallelism.
Data Parallelism
Different subsets of data are used
on different nodes in a cluster,
followed by parameter averaging
and replacement across the cluster
Model Parallelism
Different parts of the model are
trained in different devices.
Example: Train stacked RNN by
deploying each RNN on a different
device
Parallelism in TensorFlow ( Distributed TensorFlow )
“
“When you train a model you use
variables to hold and update parameters.
Variables are in-memory buffers
containing tensors”
Support
◎ Tensorflow supports Python and C++
◎ Even horizontal scaling using gRPC.
◎ CPU, GPU, CUDA
◎ Convolutional neural network (CNN)
◎ Recurrent neural network (RNN)
◎ Long short-term memory (LSTM)
TensorFlow Uses
RankBrain
Org: Google
Task: Information
Retrieval
A large-scale
deployment of deep
neural nets for search
ranking on google.com
Massively Multitask
Networks for Drug
Discovery
Org: Stanford and
Google
Task: Drug discovery
A deep neural network
model for identifying
promising drug
candidates
On-Device
Computer Vision
for OCR
Org: Google
Task: Translation
On-device computer
vision model to do
optical character
recognition to enable
real-time translation.
Tensor Flow in
Medicine:
Retinal Imaging
Case Study
Case Study: Tensor Flow in Medicine
Retinal Imaging Project by Lily Peng, MD, PHD of Google.
● Diabetic retinopathy is the fastest growing cause of
blindness in the world.
● All diabetic patients (more than 400 M) have to
checked yearly to detect a DR patient before it is
too late to intervene.
Image Courtesy: Google 2017 Development Summit slides.
Is it still a good idea to go to the doctor?
● Even if the doctor is available, there is a high chance for a the
intolerable false negative.
None
Mild
Moderate
Severe
Proliferative
Image Courtesy: Google 2017 Development Summit slides.
How did TensorFlow help?
● Easier for non-deep learning experts to implement and prototype.
● Hardware Support (GPU for example)
● Moves the challenge from the modeling and training to finding the
right problem, getting the data and consent, validation and
deployment.
Image Courtesy: Google 2017 Development Summit slides.
Place your screenshot here
TensorBoard
Visualization of graph structures and summary statistics.
2.
CNTK (Cognitive Toolkit)
An unified deep-learning toolkit that
describes neural networks as a series of
computational steps by a directed
graph
+ Points
Open
Source
❤
Free
No Ads
CNTK!
◎ At the very core of CNTK is the
Compute graph.
◎ Each CNTK compute graph is
comprised of a set of nodes
where each node represents a
key mathematical operation
◎ The edges between nodes in
the graph represent data flow
between operations
◎ ٍٍSuports GPU/multi-server.
CNTK benchmark
In the benchmark published, it seems a very powerful
tool for vertical and horizontal scaling.
3 main tasks ( TED )
Train
Define a network and train it to produce a
trained model using training data
Evaluate
Test a trained model to assess its performance
using test data
Deploy
Use a trained model, e.g. in your own solution,
to classify new instances
Uber is using driver
selfies to enhance
security, powered
by Microsoft
Cognitive Services
Emotion API
{
"faceRectangle": {
"left": 732,
"top": 201,
"width": 191,
"height": 191
},
"scores": {
"anger": 0.0000262709273,
"contempt": 0.0000356922574,
"disgust": 0.0000494521,
"fear": 0.000004611067,
"happiness": 0.999419332,
"neutral": 0.00043529534,
"sadness": 0.00000680253424,
"surprise": 0.0000225645144
}
…...
Voice recognition
Motion Detection
Face detection and tracking
Language checking ( speaking )
Speaker recognition
6,500,000Files were tested
2.94%Family error rate!
0.358 %
Error rate
MtNet was trained (labeled data)
MtNet, is a new deep learning malware classification architecture
Tensorflow Vs. Torche Vs. Theano
Marketing itself
Tensorflow got the lead that would
result in stronger community of users
and faster development.
Entities’ support
Tensorflow has got Google, while
Theano is still supported by University
of Montreal. Torch supported by
Facebook, twitter and NVIDIA!
Visualizations
Tensorflow has better computational
graph visualizations but for Images
and Graphs Theano is as good.
Debugging
Torch “automatic differentiation” is
better for debugging than Theano
and TensorFlow “symbolic
computation”.
Large-Scale
Machine Learning
How to train big model
over big data?
TensorFlowOnSpark
❏ Combines salient features from deep learning
framework TensorFlow and big-data frameworks
Apache Spark and Apache Hadoop
❏
Honourable mentions
DL4J
DeepLearning4J
Seeing AI project
https://youtu.be/R2mC-NUAmMk
References
◎ Abadi, Martín, et al. "Tensorflow: Large-scale machine learning on
heterogeneous distributed systems." arXiv preprint arXiv:1603.04467 (2016).
◎ Looks, Moshe, et al. "Deep learning with dynamic computation graphs." arXiv
preprint arXiv:1702.02181 (2017).
◎ https://www.slideshare.net/MiguelFierro1/leveraging-deep-learning-applic
ations-with-cntk
◎ https://www.slideshare.net/JeffreyShomaker/deep-learning-jeffshomaker12
017final
◎ PyTorch, Dynamic Computational Graphs and Modular Deep Learning - Carlos
Perez
◎ https://github.com/Microsoft/CNTK
◎ https://medium.com/@ricardo.guerrero/deep-learning-frameworks-a-revi
ew-before-finishing-2016-5b3ab4010b06
◎ https://cs224d.stanford.edu/lectures/CS224d-Lecture7.pdf
◎ https://www.packtpub.com/books/content/getting-started-deep-learning
◎ https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/tenso
rflow.html
◎ https://github.com/yahoo/TensorFlowOnSpark
◎ http://www.dmtk.io/slides/distributedML-aaai2017.pdf
Questions?
Thank you

Computation graphs - Tensorflow & CNTK

  • 1.
  • 2.
    Outline ➔ Computational Graphs ◆How it works? ◆ Forward and Backward Propagation ➔ TensorFlow as an example ◆ History ◆ Features ◆ Parallelism ◆ Uses and Case Study ➔ CNTK and other tools ◆ Comparison ◆ TensorFlow on Spark
  • 3.
    1. Computational Graphs Representation of acomposite function as a network of connected nodes
  • 4.
    How it works? Nodes Theycan a math operation or persistent data. Flow of data The Output of each step is an input to the next. Edges They represent the flow of data in the form multidimensional array.
  • 5.
    Forward Propagation andBackward Propagation Computational Graphs can be used to show flow of computation, information always moves one direction; it never goes backwards. It could be used to: ● Mimic Brain Activity. ● Model the behaviour of different models: Social Networks, Network of Roads. Backward Propagation Error Function is propagated backwards adjusting the weights and biases to minimize the cost, learning process. (the fundamental concept of Machine Learning)
  • 6.
  • 7.
    History 2011 Nov., 2015 Andrew Ng ofStanford alongside Jeff Dean and Greg Corrado of Google started to build a large-scale deep learning software system “DistBelief” Google releases an open source version of Tensor Flow. May, 2016 Google announces its tensor flow tailored ASIC (TPU).
  • 8.
  • 9.
    TF! Main features No changeover platforms For example: phones and tablets up to large-scale distributed and thousands of computational devices such as GPU cards. Flexible, can be used for a lot of algorithms Used in a lot of applications: NLP, Robotics, Drug Discovery, Speech Recognition ٍٍSuports parallelism.
  • 10.
    Data Parallelism Different subsetsof data are used on different nodes in a cluster, followed by parameter averaging and replacement across the cluster Model Parallelism Different parts of the model are trained in different devices. Example: Train stacked RNN by deploying each RNN on a different device Parallelism in TensorFlow ( Distributed TensorFlow )
  • 11.
    “ “When you traina model you use variables to hold and update parameters. Variables are in-memory buffers containing tensors”
  • 12.
    Support ◎ Tensorflow supportsPython and C++ ◎ Even horizontal scaling using gRPC. ◎ CPU, GPU, CUDA ◎ Convolutional neural network (CNN) ◎ Recurrent neural network (RNN) ◎ Long short-term memory (LSTM)
  • 13.
    TensorFlow Uses RankBrain Org: Google Task:Information Retrieval A large-scale deployment of deep neural nets for search ranking on google.com Massively Multitask Networks for Drug Discovery Org: Stanford and Google Task: Drug discovery A deep neural network model for identifying promising drug candidates On-Device Computer Vision for OCR Org: Google Task: Translation On-device computer vision model to do optical character recognition to enable real-time translation.
  • 14.
  • 15.
    Case Study: TensorFlow in Medicine Retinal Imaging Project by Lily Peng, MD, PHD of Google. ● Diabetic retinopathy is the fastest growing cause of blindness in the world. ● All diabetic patients (more than 400 M) have to checked yearly to detect a DR patient before it is too late to intervene. Image Courtesy: Google 2017 Development Summit slides.
  • 16.
    Is it stilla good idea to go to the doctor? ● Even if the doctor is available, there is a high chance for a the intolerable false negative. None Mild Moderate Severe Proliferative Image Courtesy: Google 2017 Development Summit slides.
  • 17.
    How did TensorFlowhelp? ● Easier for non-deep learning experts to implement and prototype. ● Hardware Support (GPU for example) ● Moves the challenge from the modeling and training to finding the right problem, getting the data and consent, validation and deployment. Image Courtesy: Google 2017 Development Summit slides.
  • 18.
    Place your screenshothere TensorBoard Visualization of graph structures and summary statistics.
  • 19.
    2. CNTK (Cognitive Toolkit) Anunified deep-learning toolkit that describes neural networks as a series of computational steps by a directed graph
  • 20.
  • 21.
    CNTK! ◎ At thevery core of CNTK is the Compute graph. ◎ Each CNTK compute graph is comprised of a set of nodes where each node represents a key mathematical operation ◎ The edges between nodes in the graph represent data flow between operations ◎ ٍٍSuports GPU/multi-server.
  • 22.
    CNTK benchmark In thebenchmark published, it seems a very powerful tool for vertical and horizontal scaling.
  • 23.
    3 main tasks( TED ) Train Define a network and train it to produce a trained model using training data Evaluate Test a trained model to assess its performance using test data Deploy Use a trained model, e.g. in your own solution, to classify new instances
  • 24.
    Uber is usingdriver selfies to enhance security, powered by Microsoft Cognitive Services
  • 25.
    Emotion API { "faceRectangle": { "left":732, "top": 201, "width": 191, "height": 191 }, "scores": { "anger": 0.0000262709273, "contempt": 0.0000356922574, "disgust": 0.0000494521, "fear": 0.000004611067, "happiness": 0.999419332, "neutral": 0.00043529534, "sadness": 0.00000680253424, "surprise": 0.0000225645144 } …...
  • 26.
    Voice recognition Motion Detection Facedetection and tracking Language checking ( speaking ) Speaker recognition
  • 27.
    6,500,000Files were tested 2.94%Familyerror rate! 0.358 % Error rate MtNet was trained (labeled data) MtNet, is a new deep learning malware classification architecture
  • 28.
    Tensorflow Vs. TorcheVs. Theano Marketing itself Tensorflow got the lead that would result in stronger community of users and faster development. Entities’ support Tensorflow has got Google, while Theano is still supported by University of Montreal. Torch supported by Facebook, twitter and NVIDIA! Visualizations Tensorflow has better computational graph visualizations but for Images and Graphs Theano is as good. Debugging Torch “automatic differentiation” is better for debugging than Theano and TensorFlow “symbolic computation”.
  • 29.
    Large-Scale Machine Learning How totrain big model over big data?
  • 30.
    TensorFlowOnSpark ❏ Combines salientfeatures from deep learning framework TensorFlow and big-data frameworks Apache Spark and Apache Hadoop ❏
  • 31.
  • 32.
  • 33.
    References ◎ Abadi, Martín,et al. "Tensorflow: Large-scale machine learning on heterogeneous distributed systems." arXiv preprint arXiv:1603.04467 (2016). ◎ Looks, Moshe, et al. "Deep learning with dynamic computation graphs." arXiv preprint arXiv:1702.02181 (2017). ◎ https://www.slideshare.net/MiguelFierro1/leveraging-deep-learning-applic ations-with-cntk ◎ https://www.slideshare.net/JeffreyShomaker/deep-learning-jeffshomaker12 017final ◎ PyTorch, Dynamic Computational Graphs and Modular Deep Learning - Carlos Perez ◎ https://github.com/Microsoft/CNTK ◎ https://medium.com/@ricardo.guerrero/deep-learning-frameworks-a-revi ew-before-finishing-2016-5b3ab4010b06 ◎ https://cs224d.stanford.edu/lectures/CS224d-Lecture7.pdf ◎ https://www.packtpub.com/books/content/getting-started-deep-learning ◎ https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/tenso rflow.html ◎ https://github.com/yahoo/TensorFlowOnSpark ◎ http://www.dmtk.io/slides/distributedML-aaai2017.pdf
  • 34.