Cognitive Toolkit - Deep
Learning framework from
Microsoft
Łukasz Grala
lukasz@tidk.pl | lukasz.grala@cs.put.poznan.pl
Łukasz Grala
• Architekt danych w TIDK
• Twórca „Data Scientist as as Service”
• Certyfikowany trener Microsoft i wykładowca na wyższych uczelniach
• Autor zaawansowanych szkoleń i warsztatów, oraz licznych publikacji i webcastów
• Od 2010 roku wyróżniany nagrodą Microsoft Data Platform MVP
• Doktorant Politechnika Poznańska – Wydział Informatyki (obszar bazy danych, eksploracja danych, uczenie maszynowe)
• Prelegent na licznych konferencjach w kraju i na świecie
• Posiada liczne certyfikaty (MCT, MCSE, MCSA, MCITP,…)
• Członek zarządu Polskiego Towarzystwa Informatycznego Oddział Wielkopolski
• Członek i lider Data Community Poland (dawniej Polish SQL Server User Group (PLSSUG))
• Pasjonat analizy, przechowywania i przetwarzania danych, miłośnik Jazzu i MTB
email lukasz@tidk.pl - lukasz.grala@cs.put.poznan.pl blog: grala.it
Agenda
• Overview
• Artificial Neural Networks & Deep Learning
• Software & Frameworks
• Cognitive Toolkit (aka CNTK)
Overview
Cognitive Toolkit - Deep Learning framework from Microsoft
Machine Learning
1763 1805 1812 1913 1950 1951 1967 1982 1995 1997 2016
The Underpinngs of
Bayes' Theorem
Least Squares
Bayes' Theorem
Markov Chains First Neural
Network Machine
Nearest NeighborTuring's Learning
Machine
Recurrent Neural
Network
Random Forest
Algorithm
Support Vector
Machines
IBM Deep Blue
Beats Kasparov
2012
Recognizing Cats
on YouTube
AlphaGo
1958
Single-layer neural
network on a
room size
computer
Machine Learning
“Can machines do what we (as thinking entities)
can do?”
Alan Touring, “Computing Machinery and Intelligence”. Mind, 1950
Turing’s test
Machine Learning
“Machine Learning is a field of computer science that
gives computers the ability to learn without explicitly
programmed.”
Samuel Arthur, “Some Studies in Machine Learning Using the Game of
Checkers”, IBM Journal of Research and Development, 1959
Machine Learning
“A computer program is said to learn from experience
E with respect to some class of tasks T and
performance measure P if its performance at tasks in
T, as measured by P, improves with experience E.”
Tom. M. Mitchell, “Machine Learning. McGraw Hill, 1997
Supervised
Each point in the training data is associated with a label or output
Task is to learn a model/hypothesis that predicts output for points not
in the training dataset
Classification
Given a set of features, predict discrete outputs (Fraud/Not Fraud)
Regression
Given a set of features, predict continuous outputs (Credit score, item price, …)
Recommendation
Given a set of {user, item, rating} triplets and optionally features about users and items,
predict ratings for an item, items similar to a given item, users similar to a given user
Anomaly detection
Given a set of features for “normal” examples,
predict normal vs anomaly
Unsupervised
Points on a training dataset are not associated with known output values
Creates a model that learns inherent structure in training data
Clustering
Given a training dataset, find a small number of centers, ‘k’, that are “close” to points in the dataset
Each point in the dataset is associated with at most a single center
Principal Component Analysis
Given a training dataset with ‘N’ features, find a set of ‘k’ features that approximates the data with bounded
error
The set of ‘k’ principal components is representative of the original dataset but with much lower
dimensionality
Time Series
ARIMA, ETS,…
Machine Learning
• Estimate product demand
• Predict sales figures
• Analyze marketing returns
Regression
• Predict credit risk
• Detect fraud
• Catch abnormal equipments readings
Anomaly Detection
• Perform customer segmentation
• Predict customer tastes
• Determine market price
Clustering
• Two-Class Classification
• Multi-Class ClassificationClassification
• Vision and Speech
• Text
• Time Series
Deep Learning
Artificial Neural Networks
& Deep Learning
Machine Learning Introduction
Biological Neural Networks
Artificial Neural Networks
ANNs are processing devices (algorithms or actual hardware) that are loosely modeled after the neuronal
structure of the mamalian cerebral cortex but on much smaller scales.
The simplest definition of a neural network, more properly referred to as an 'artificial' neural network
(ANN), is provided by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen. He
defines a neural network as:
"...a computing system made up of a number of simple, highly interconnected processing
elements, which process information by their dynamic state response to external inputs. “
Neural Networks
Neural Networks
Overfitting
The green line represents an overfitted
model and the black line represents a
regularized model. While the green line
best follows the training data, it is too
dependent on that data and it is likely to
have a higher error rate on new unseen
data, compared to the black line.
Deep Learning Use Cases
•Sentiment Analysis
•Augemented Search
•Fraud Detection
•NLP
Text
•Facial Recognition
•Emotion Recognition
•Image Search
•Photo Clustering
•Tags
•Motion Detection
Video and Image
•Voice Recognition
•Voice Search
•Sentiment Analysis
•Flaw Detection
Sound
•Prediction
•Recommendation
•Risk Detection
Time Series
Convolutional network
Convolutional neural network (CNN, or ConvNet) is a class of deep,
feed-forward artificial neural networks that has successfully been
applied to analyzing visual imagery.
Convolutional network
ImageNet CNN
• Model, który zwyciężył w konkursie ImageNet w 2012
• 5 warstw konwolucyjnych i 2 warstwy pełne
• Jednostki ReLU i Droput o najwyższej warstwie 60 milionów parametrów
• 1.2 mln obrazów treningowych
• Klasyfikacja do 1000 klas
• Uczenie na dwóch GPU przez tydzień
• Błąd 16.4% (drugie miejsce 26.2%)
ImageNet CNN
Recurrent Neural Networks
A recurrent neural network (RNN) is a class of artificial neural
network where connections between units form a directed cycle.
Long/Short term Memory Network
Hopfield Network Model
Boltzman Machine Network
Deep Belief Network
Deep Belief Network
Deep Auto-encoders
ImageNet (ILSVRC top-5 error)
ImageNet
2017 - Fast R-CNN (from Microsoft Research)
Software & Frameworks
Cognitive Toolkit - Deep Learning framework from Microsoft
Deep Learning - Frameworks
Benchmark CNTK
https://github.com/Alexey-Kamenev/Benchmarks
https://github.com/Microsoft/CNTK
MicrosoftML
Learners
Algorithms Strengths
rxFastLinear Fast, accurate linear learner with auto L1 & L2
rxLogisticRegression Logistic Regression with L1 & L2
rxFastTree
Boosted Decision tree from Bing. Competitive wth
XGBoost. Most accurate learner for most cases
rxFastForest Random Forest
rxNeuralNet GPU accelereted Net# DNNs with Convolutions
rxOneClassSvm Anomaly or unbalanced binary classification
Learners - Scalability
• Streaming (not RAM bound)
• Billions of features
• Multi-proc
• GPU acceleration for DNNs
• Distributed on Hadoop/Spark via Ensambling
Sentiment Analysis
• Pre-trained model
• Cognitive Service Parity
• Uses DNN Embedding
• Domain Adaptation
Image Featurization
Convolutional DNNs with GPU
Pre-trained Models
• ResNet18
• ResNet 50
• ResNet 101
• AlexNet
ONNX is a community project created by Facebook and Microsoft.
ONNX provides a definition of an extensible computation graph model, as well as
definitions of built-in operators and standard data types.
Each computation dataflow graph is structured as a list of nodes that form an
acyclic graph. Nodes have one or more inputs and one or more outputs. Each
node is a call to an operator. The graph also has metadata to help document its
purpose, author, etc.
Operators are implemented externally to the graph, but the set of built-in
operators are portable across frameworks. Every framework supporting ONNX will
provide implementations of these operators on the applicable data types.
Cognitive Toolkit
Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit
• FFN, CNN, RNN/LSTN, Batch normalization, Sequence-to-Sequence with
attention and more
• Reinforcement learning, generative adversarial networks, supervised and
unsupervised learning
• Ability to add new user-defined core-components on the GPU from Python
• Automatic hyperparameter tuning
• Built-in readers optimized for massive datasets
• Full API’s for defining networks, leaners, readers, training and evaluation from
Python, C++, C#, BrainScript
• Evaluate models with Python, C++, C#, R and BrainScript
• Automatic shape inference based on your data
CNTK – Layers Library
Simple 1-layer hidden layer model – function Dense()
CNTK – Layers Library
Alternative Sequential()
2011-style feed-forward speech-recognition network
with 6 hidden sigmoid layers of identical dimensions
CNTK – Layers Library
Multiple Layers
CNTK – Layers Library
Single-Multiple Layers Recurrence Network
Recurrence (step_function, go_backwards=default_override_or(False),
initial_state=default_override_or(0), return_full_state=False, name=‘’)
RecurrenceFrom(step_function, go_backwards=default_override_or(False),
return_full_state=False, name=‘’)
Fold(folder_function, go_backwards=default_override_or(False),
initial_state=default_override_or(0), return_full_state=False, name=‘’)
UnfoldFrom(generator_function, until_predicate=None, length_increase=1, name='')
CNTK – Convolutional Neural Networks
CNTK – CNN
MaxPooling(), AveragePooling()
GlobalMaxPooling(), GlobalAveragePooling()
CNTK – Convolutional Neural Networks
Example CNN
CNTK – Layers Library
Long-Short-Term-Memory, Gated Recurrent Unit or Recurrence Neural Netowrk
LSTM(shape, cell_shape=None, activation=default_override_or(tanh),
use_peepholes=default_override_or(False),
init=default_override_or(glorot_uniform()), init_bias=default_override_or(0),
enable_self_stabilization=default_override_or(False),
name=‘’)
GRU(shape, cell_shape=None, activation=default_override_or(tanh),
init=default_override_or(glorot_uniform()), init_bias=default_override_or(0),
enable_self_stabilization=default_override_or(False),
name=‘’)
RNNStep(shape, cell_shape=None, activation=default_override_or(sigmoid),
init=default_override_or(glorot_uniform()),
init_bias=default_override_or(0),
enable_self_stabilization=default_override_or(False),
name='')
CNTK – Layers Library
Example Reccurent LSTM
CNTK – Layers Library
Functions to create layers for batch normalization, layer normalization, and self-
stabilization
BatchNormalization(map_rank=default_override_or(None),
input init_scale=1,
normalization_time_constant=default_override_or(5000),
blend_time_constant=0, epsilon=default_override_or(0.00001),
use_cntk_engine=default_override_or(False),name=‘’)
LayerNormalization(initial_scale=1, initial_bias=0,
epsilon=default_override_or(0.00001), name=‘’)
Stabilizer(steepness=4,
enable_self_stabilization=default_override_or(True), name='')
6161
Łukasz Grala, Microsoft MVP
CEO, Data Architect
Lukasz.grala@cs.put.poznan.pl
+48 663832323
http://tidk.pl
Lukasz.grala@tidk.pl
http://dsaas.co
14-16 maja 2018, Wrocław
http://sqlday.pl
FB/sqlday
http://sqlday.plFB/sqlday
6464
Łukasz Grala, Microsoft MVP
CEO, Data Architect
Lukasz.grala@cs.put.poznan.pl
+48 663832323
http://tidk.pl
Lukasz.grala@tidk.pl
http://dsaas.co

Cognitive Toolkit - Deep Learning framework from Microsoft

  • 1.
    Cognitive Toolkit -Deep Learning framework from Microsoft Łukasz Grala lukasz@tidk.pl | lukasz.grala@cs.put.poznan.pl
  • 2.
    Łukasz Grala • Architektdanych w TIDK • Twórca „Data Scientist as as Service” • Certyfikowany trener Microsoft i wykładowca na wyższych uczelniach • Autor zaawansowanych szkoleń i warsztatów, oraz licznych publikacji i webcastów • Od 2010 roku wyróżniany nagrodą Microsoft Data Platform MVP • Doktorant Politechnika Poznańska – Wydział Informatyki (obszar bazy danych, eksploracja danych, uczenie maszynowe) • Prelegent na licznych konferencjach w kraju i na świecie • Posiada liczne certyfikaty (MCT, MCSE, MCSA, MCITP,…) • Członek zarządu Polskiego Towarzystwa Informatycznego Oddział Wielkopolski • Członek i lider Data Community Poland (dawniej Polish SQL Server User Group (PLSSUG)) • Pasjonat analizy, przechowywania i przetwarzania danych, miłośnik Jazzu i MTB email lukasz@tidk.pl - lukasz.grala@cs.put.poznan.pl blog: grala.it
  • 3.
    Agenda • Overview • ArtificialNeural Networks & Deep Learning • Software & Frameworks • Cognitive Toolkit (aka CNTK)
  • 4.
    Overview Cognitive Toolkit -Deep Learning framework from Microsoft
  • 6.
    Machine Learning 1763 18051812 1913 1950 1951 1967 1982 1995 1997 2016 The Underpinngs of Bayes' Theorem Least Squares Bayes' Theorem Markov Chains First Neural Network Machine Nearest NeighborTuring's Learning Machine Recurrent Neural Network Random Forest Algorithm Support Vector Machines IBM Deep Blue Beats Kasparov 2012 Recognizing Cats on YouTube AlphaGo 1958 Single-layer neural network on a room size computer
  • 7.
    Machine Learning “Can machinesdo what we (as thinking entities) can do?” Alan Touring, “Computing Machinery and Intelligence”. Mind, 1950 Turing’s test
  • 8.
    Machine Learning “Machine Learningis a field of computer science that gives computers the ability to learn without explicitly programmed.” Samuel Arthur, “Some Studies in Machine Learning Using the Game of Checkers”, IBM Journal of Research and Development, 1959
  • 9.
    Machine Learning “A computerprogram is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.” Tom. M. Mitchell, “Machine Learning. McGraw Hill, 1997
  • 10.
    Supervised Each point inthe training data is associated with a label or output Task is to learn a model/hypothesis that predicts output for points not in the training dataset Classification Given a set of features, predict discrete outputs (Fraud/Not Fraud) Regression Given a set of features, predict continuous outputs (Credit score, item price, …) Recommendation Given a set of {user, item, rating} triplets and optionally features about users and items, predict ratings for an item, items similar to a given item, users similar to a given user Anomaly detection Given a set of features for “normal” examples, predict normal vs anomaly
  • 11.
    Unsupervised Points on atraining dataset are not associated with known output values Creates a model that learns inherent structure in training data Clustering Given a training dataset, find a small number of centers, ‘k’, that are “close” to points in the dataset Each point in the dataset is associated with at most a single center Principal Component Analysis Given a training dataset with ‘N’ features, find a set of ‘k’ features that approximates the data with bounded error The set of ‘k’ principal components is representative of the original dataset but with much lower dimensionality Time Series ARIMA, ETS,…
  • 12.
    Machine Learning • Estimateproduct demand • Predict sales figures • Analyze marketing returns Regression • Predict credit risk • Detect fraud • Catch abnormal equipments readings Anomaly Detection • Perform customer segmentation • Predict customer tastes • Determine market price Clustering • Two-Class Classification • Multi-Class ClassificationClassification • Vision and Speech • Text • Time Series Deep Learning
  • 13.
    Artificial Neural Networks &Deep Learning Machine Learning Introduction
  • 15.
  • 16.
    Artificial Neural Networks ANNsare processing devices (algorithms or actual hardware) that are loosely modeled after the neuronal structure of the mamalian cerebral cortex but on much smaller scales. The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen. He defines a neural network as: "...a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs. “
  • 17.
  • 18.
  • 19.
    Overfitting The green linerepresents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and it is likely to have a higher error rate on new unseen data, compared to the black line.
  • 20.
    Deep Learning UseCases •Sentiment Analysis •Augemented Search •Fraud Detection •NLP Text •Facial Recognition •Emotion Recognition •Image Search •Photo Clustering •Tags •Motion Detection Video and Image •Voice Recognition •Voice Search •Sentiment Analysis •Flaw Detection Sound •Prediction •Recommendation •Risk Detection Time Series
  • 21.
    Convolutional network Convolutional neuralnetwork (CNN, or ConvNet) is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery.
  • 22.
  • 24.
    ImageNet CNN • Model,który zwyciężył w konkursie ImageNet w 2012 • 5 warstw konwolucyjnych i 2 warstwy pełne • Jednostki ReLU i Droput o najwyższej warstwie 60 milionów parametrów • 1.2 mln obrazów treningowych • Klasyfikacja do 1000 klas • Uczenie na dwóch GPU przez tydzień • Błąd 16.4% (drugie miejsce 26.2%)
  • 25.
  • 26.
    Recurrent Neural Networks Arecurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 34.
  • 35.
    ImageNet 2017 - FastR-CNN (from Microsoft Research)
  • 36.
    Software & Frameworks CognitiveToolkit - Deep Learning framework from Microsoft
  • 37.
    Deep Learning -Frameworks
  • 38.
  • 39.
  • 40.
    Learners Algorithms Strengths rxFastLinear Fast,accurate linear learner with auto L1 & L2 rxLogisticRegression Logistic Regression with L1 & L2 rxFastTree Boosted Decision tree from Bing. Competitive wth XGBoost. Most accurate learner for most cases rxFastForest Random Forest rxNeuralNet GPU accelereted Net# DNNs with Convolutions rxOneClassSvm Anomaly or unbalanced binary classification
  • 41.
    Learners - Scalability •Streaming (not RAM bound) • Billions of features • Multi-proc • GPU acceleration for DNNs • Distributed on Hadoop/Spark via Ensambling
  • 42.
    Sentiment Analysis • Pre-trainedmodel • Cognitive Service Parity • Uses DNN Embedding • Domain Adaptation
  • 43.
    Image Featurization Convolutional DNNswith GPU Pre-trained Models • ResNet18 • ResNet 50 • ResNet 101 • AlexNet
  • 45.
    ONNX is acommunity project created by Facebook and Microsoft. ONNX provides a definition of an extensible computation graph model, as well as definitions of built-in operators and standard data types. Each computation dataflow graph is structured as a list of nodes that form an acyclic graph. Nodes have one or more inputs and one or more outputs. Each node is a call to an operator. The graph also has metadata to help document its purpose, author, etc. Operators are implemented externally to the graph, but the set of built-in operators are portable across frameworks. Every framework supporting ONNX will provide implementations of these operators on the applicable data types.
  • 47.
    Cognitive Toolkit Cognitive Toolkit- Deep Learning framework from Microsoft
  • 48.
    Cognitive Toolkit • FFN,CNN, RNN/LSTN, Batch normalization, Sequence-to-Sequence with attention and more • Reinforcement learning, generative adversarial networks, supervised and unsupervised learning • Ability to add new user-defined core-components on the GPU from Python • Automatic hyperparameter tuning • Built-in readers optimized for massive datasets • Full API’s for defining networks, leaners, readers, training and evaluation from Python, C++, C#, BrainScript • Evaluate models with Python, C++, C#, R and BrainScript • Automatic shape inference based on your data
  • 49.
    CNTK – LayersLibrary Simple 1-layer hidden layer model – function Dense()
  • 50.
    CNTK – LayersLibrary Alternative Sequential() 2011-style feed-forward speech-recognition network with 6 hidden sigmoid layers of identical dimensions
  • 51.
    CNTK – LayersLibrary Multiple Layers
  • 52.
    CNTK – LayersLibrary Single-Multiple Layers Recurrence Network Recurrence (step_function, go_backwards=default_override_or(False), initial_state=default_override_or(0), return_full_state=False, name=‘’) RecurrenceFrom(step_function, go_backwards=default_override_or(False), return_full_state=False, name=‘’) Fold(folder_function, go_backwards=default_override_or(False), initial_state=default_override_or(0), return_full_state=False, name=‘’) UnfoldFrom(generator_function, until_predicate=None, length_increase=1, name='')
  • 53.
    CNTK – ConvolutionalNeural Networks
  • 54.
    CNTK – CNN MaxPooling(),AveragePooling() GlobalMaxPooling(), GlobalAveragePooling()
  • 55.
    CNTK – ConvolutionalNeural Networks Example CNN
  • 56.
    CNTK – LayersLibrary Long-Short-Term-Memory, Gated Recurrent Unit or Recurrence Neural Netowrk LSTM(shape, cell_shape=None, activation=default_override_or(tanh), use_peepholes=default_override_or(False), init=default_override_or(glorot_uniform()), init_bias=default_override_or(0), enable_self_stabilization=default_override_or(False), name=‘’) GRU(shape, cell_shape=None, activation=default_override_or(tanh), init=default_override_or(glorot_uniform()), init_bias=default_override_or(0), enable_self_stabilization=default_override_or(False), name=‘’) RNNStep(shape, cell_shape=None, activation=default_override_or(sigmoid), init=default_override_or(glorot_uniform()), init_bias=default_override_or(0), enable_self_stabilization=default_override_or(False), name='')
  • 57.
    CNTK – LayersLibrary Example Reccurent LSTM
  • 58.
    CNTK – LayersLibrary Functions to create layers for batch normalization, layer normalization, and self- stabilization BatchNormalization(map_rank=default_override_or(None), input init_scale=1, normalization_time_constant=default_override_or(5000), blend_time_constant=0, epsilon=default_override_or(0.00001), use_cntk_engine=default_override_or(False),name=‘’) LayerNormalization(initial_scale=1, initial_bias=0, epsilon=default_override_or(0.00001), name=‘’) Stabilizer(steepness=4, enable_self_stabilization=default_override_or(True), name='')
  • 59.
    6161 Łukasz Grala, MicrosoftMVP CEO, Data Architect Lukasz.grala@cs.put.poznan.pl +48 663832323 http://tidk.pl Lukasz.grala@tidk.pl http://dsaas.co
  • 60.
    14-16 maja 2018,Wrocław http://sqlday.pl FB/sqlday
  • 61.
  • 62.
    6464 Łukasz Grala, MicrosoftMVP CEO, Data Architect Lukasz.grala@cs.put.poznan.pl +48 663832323 http://tidk.pl Lukasz.grala@tidk.pl http://dsaas.co

Editor's Notes

  • #6 AI: ML (Uczenie Bayesowskie, Drzew decyzyjnych, Zbioru reguł, Sieci ekspertowe Sieci neuronowe Dowodzenie twierdzeń Podejmowanie decyzji przy braku pełnych danych Rozumowanie logiczne/racjonalne Logika rozmyta Algorytmy ewolucyjne
  • #38 Microsoft Azure VM opis https://docs.microsoft.com/pl-pl/azure/machine-learning/data-science-virtual-machine/overview VM: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-ads.linux-data-science-vm-ubuntu?tab=Overview https://azuremarketplace.microsoft.com/en-us/marketplace/apps?search=Data%20Science%20Virtual%20Machine&page=1
  • #52 shape: output dimension of this layer activation (default: None: pass a function here to be used as the activation function, such as activation=relu input_rank: if given, number of trailing dimensions that are transformed by Dense() (map_rankmust not be given) map_rank: if given, the number of leading dimensions that are not transformed by Dense()(input_rank must not be given) init (default: glorot_uniform()): initializer descriptor for the weights. See cntk.initializer for a full list of random-initialization options. bias: if False, do not include a bias parameter init_bias (default: 0): initializer for the bias FOR EMBEDDING shape: the dimension of the desired embedding vector. Must not be None unless weights are passed init: initializer descriptor for the weights to be learned. See cntk.initializer for a full list of initialization options. weights (numpy array): if given, embeddings are not learned but specified by this array (which could be, e.g., loaded from a file) and not updated further during training
  • #56 filter_shape: shape of receptive field of the filter, e.g. (5,5) for a 2D filter (not including the input feature-map depth) num_filters: number of output channels (number of filters) activation: optional non-linearity, e.g. activation=relu init: initializer descriptor for the weights, e.g. glorot_uniform(). See cntk.initializer for a full list of random-initialization options. pad: if False (default), then the filter will be shifted over the “valid” area of input, that is, no value outside the area is used. If pad is True on the other hand, the filter will be applied to all input positions, and values outside the valid region will be considered zero. strides: increment when sliding the filter over the input. E.g. (2,2) to reduce the dimensions by 2 bias: if False, do not include a bias parameter init_bias: initializer for the bias use_correlation: currently always True and cannot be changed. It indicates that Convolution()actually computes the cross-correlation rather than the true convolution
  • #57 filter_shape: receptive field (window) to pool over, e.g. (2,2) (not including the input feature-map depth) strides: increment when sliding the pool over the input. E.g. (2,2) to reduce the dimensions by 2 pad: if False (default), then the pool will be shifted over the “valid” area of input, that is, no value outside the area is used. If pad is True on the other hand, the pool will be applied to all input positions, and values outside the valid region will be considered zero. For average pooling, count for average does not include padded values.
  • #58 filter_shape: receptive field (window) to pool over, e.g. (2,2) (not including the input feature-map depth) strides: increment when sliding the pool over the input. E.g. (2,2) to reduce the dimensions by 2 pad: if False (default), then the pool will be shifted over the “valid” area of input, that is, no value outside the area is used. If pad is True on the other hand, the pool will be applied to all input positions, and values outside the valid region will be considered zero. For average pooling, count for average does not include padded values.
  • #59 shape: dimension of the output cell_shape (optional): the dimension of the LSTM’s cell. If None, the cell shape is identical to shape. If specified, an additional linear projection will be inserted to project from the cell dimension to the output shape. use_peepholes (optional): if True, then use peephole connections in the LSTM init: initializer descriptor for the weights. See cntk.initializer for a full list of initialization options. enable_self_stabilization (optional): if True, insert a Stabilizer() for the hidden state and cell
  • #61 BatchNormalization: map_rank: if given then normalize only over this many leading dimensions. E.g. 1 to tie all (h,w) in a (C, H, W)-shaped input. Currently, the only allowed values are None (no pooling) and 1 (e.g. pooling across all pixel positions of an image) normalization_time_constant (default 5000): time constant in samples of the first-order low-pass filter that is used to compute mean/variance statistics for use in inference initial_scale: initial value of scale parameter epsilon: small value that gets added to the variance estimate when computing the inverse use_cntk_engine: if True, use CNTK’s native implementation. If false, use cuDNN’s implementation (GPU only). disable_regularization: if True then disable regularization in BatchNormalization. LayerNormalization: initial_scale: initial value of scale parameter initial_bias: initial value of bias parameter Stabilizer: steepness: sharpness of the knee of the softplus function