Cognitive Toolkit - Deep Learning framework from Microsoft

Cognitive Toolkit - Deep
Learning framework from
Microsoft
Łukasz Grala
lukasz@tidk.pl | lukasz.grala@cs.put.poznan.pl

Łukasz Grala
• Architekt danych w TIDK
• Twórca „Data Scientist as as Service”
• Certyfikowany trener Microsoft i wykładowca na wyższych uczelniach
• Autor zaawansowanych szkoleń i warsztatów, oraz licznych publikacji i webcastów
• Od 2010 roku wyróżniany nagrodą Microsoft Data Platform MVP
• Doktorant Politechnika Poznańska – Wydział Informatyki (obszar bazy danych, eksploracja danych, uczenie maszynowe)
• Prelegent na licznych konferencjach w kraju i na świecie
• Posiada liczne certyfikaty (MCT, MCSE, MCSA, MCITP,…)
• Członek zarządu Polskiego Towarzystwa Informatycznego Oddział Wielkopolski
• Członek i lider Data Community Poland (dawniej Polish SQL Server User Group (PLSSUG))
• Pasjonat analizy, przechowywania i przetwarzania danych, miłośnik Jazzu i MTB
email lukasz@tidk.pl - lukasz.grala@cs.put.poznan.pl blog: grala.it

Agenda
• Overview
• Artificial Neural Networks & Deep Learning
• Software & Frameworks
• Cognitive Toolkit (aka CNTK)

Overview
Cognitive Toolkit - Deep Learning framework from Microsoft

Machine Learning
1763 1805 1812 1913 1950 1951 1967 1982 1995 1997 2016
The Underpinngs of
Bayes' Theorem
Least Squares
Bayes' Theorem
Markov Chains First Neural
Network Machine
Nearest NeighborTuring's Learning
Machine
Recurrent Neural
Network
Random Forest
Algorithm
Support Vector
Machines
IBM Deep Blue
Beats Kasparov
2012
Recognizing Cats
on YouTube
AlphaGo
1958
Single-layer neural
network on a
room size
computer

Machine Learning
“Can machines do what we (as thinking entities)
can do?”
Alan Touring, “Computing Machinery and Intelligence”. Mind, 1950
Turing’s test

Machine Learning
“Machine Learning is a field of computer science that
gives computers the ability to learn without explicitly
programmed.”
Samuel Arthur, “Some Studies in Machine Learning Using the Game of
Checkers”, IBM Journal of Research and Development, 1959

Machine Learning
“A computer program is said to learn from experience
E with respect to some class of tasks T and
performance measure P if its performance at tasks in
T, as measured by P, improves with experience E.”
Tom. M. Mitchell, “Machine Learning. McGraw Hill, 1997

Supervised
Each point in the training data is associated with a label or output
Task is to learn a model/hypothesis that predicts output for points not
in the training dataset
Classification
Given a set of features, predict discrete outputs (Fraud/Not Fraud)
Regression
Given a set of features, predict continuous outputs (Credit score, item price, …)
Recommendation
Given a set of {user, item, rating} triplets and optionally features about users and items,
predict ratings for an item, items similar to a given item, users similar to a given user
Anomaly detection
Given a set of features for “normal” examples,
predict normal vs anomaly

Unsupervised
Points on a training dataset are not associated with known output values
Creates a model that learns inherent structure in training data
Clustering
Given a training dataset, find a small number of centers, ‘k’, that are “close” to points in the dataset
Each point in the dataset is associated with at most a single center
Principal Component Analysis
Given a training dataset with ‘N’ features, find a set of ‘k’ features that approximates the data with bounded
error
The set of ‘k’ principal components is representative of the original dataset but with much lower
dimensionality
Time Series
ARIMA, ETS,…

Machine Learning
• Estimate product demand
• Predict sales figures
• Analyze marketing returns
Regression
• Predict credit risk
• Detect fraud
• Catch abnormal equipments readings
Anomaly Detection
• Perform customer segmentation
• Predict customer tastes
• Determine market price
Clustering
• Two-Class Classification
• Multi-Class ClassificationClassification
• Vision and Speech
• Text
• Time Series
Deep Learning

Artificial Neural Networks
& Deep Learning
Machine Learning Introduction

Artificial Neural Networks
ANNs are processing devices (algorithms or actual hardware) that are loosely modeled after the neuronal
structure of the mamalian cerebral cortex but on much smaller scales.
The simplest definition of a neural network, more properly referred to as an 'artificial' neural network
(ANN), is provided by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen. He
defines a neural network as:
"...a computing system made up of a number of simple, highly interconnected processing
elements, which process information by their dynamic state response to external inputs. “

Overfitting
The green line represents an overfitted
model and the black line represents a
regularized model. While the green line
best follows the training data, it is too
dependent on that data and it is likely to
have a higher error rate on new unseen
data, compared to the black line.

Deep Learning Use Cases
•Sentiment Analysis
•Augemented Search
•Fraud Detection
•NLP
Text
•Facial Recognition
•Emotion Recognition
•Image Search
•Photo Clustering
•Tags
•Motion Detection
Video and Image
•Voice Recognition
•Voice Search
•Sentiment Analysis
•Flaw Detection
Sound
•Prediction
•Recommendation
•Risk Detection
Time Series

Convolutional network
Convolutional neural network (CNN, or ConvNet) is a class of deep,
feed-forward artificial neural networks that has successfully been
applied to analyzing visual imagery.

ImageNet CNN
• Model, który zwyciężył w konkursie ImageNet w 2012
• 5 warstw konwolucyjnych i 2 warstwy pełne
• Jednostki ReLU i Droput o najwyższej warstwie 60 milionów parametrów
• 1.2 mln obrazów treningowych
• Klasyfikacja do 1000 klas
• Uczenie na dwóch GPU przez tydzień
• Błąd 16.4% (drugie miejsce 26.2%)

Recurrent Neural Networks
A recurrent neural network (RNN) is a class of artificial neural
network where connections between units form a directed cycle.

Long/Short term Memory Network

ImageNet
2017 - Fast R-CNN (from Microsoft Research)

Software & Frameworks

Benchmark CNTK
https://github.com/Alexey-Kamenev/Benchmarks
https://github.com/Microsoft/CNTK

Learners
Algorithms Strengths
rxFastLinear Fast, accurate linear learner with auto L1 & L2
rxLogisticRegression Logistic Regression with L1 & L2
rxFastTree
Boosted Decision tree from Bing. Competitive wth
XGBoost. Most accurate learner for most cases
rxFastForest Random Forest
rxNeuralNet GPU accelereted Net# DNNs with Convolutions
rxOneClassSvm Anomaly or unbalanced binary classification

Learners - Scalability
• Streaming (not RAM bound)
• Billions of features
• Multi-proc
• GPU acceleration for DNNs
• Distributed on Hadoop/Spark via Ensambling

Sentiment Analysis
• Pre-trained model
• Cognitive Service Parity
• Uses DNN Embedding
• Domain Adaptation

Image Featurization
Convolutional DNNs with GPU
Pre-trained Models
• ResNet18
• ResNet 50
• ResNet 101
• AlexNet

ONNX is a community project created by Facebook and Microsoft.
ONNX provides a definition of an extensible computation graph model, as well as
definitions of built-in operators and standard data types.
Each computation dataflow graph is structured as a list of nodes that form an
acyclic graph. Nodes have one or more inputs and one or more outputs. Each
node is a call to an operator. The graph also has metadata to help document its
purpose, author, etc.
Operators are implemented externally to the graph, but the set of built-in
operators are portable across frameworks. Every framework supporting ONNX will
provide implementations of these operators on the applicable data types.

Cognitive Toolkit

Cognitive Toolkit
• FFN, CNN, RNN/LSTN, Batch normalization, Sequence-to-Sequence with
attention and more
• Reinforcement learning, generative adversarial networks, supervised and
unsupervised learning
• Ability to add new user-defined core-components on the GPU from Python
• Automatic hyperparameter tuning
• Built-in readers optimized for massive datasets
• Full API’s for defining networks, leaners, readers, training and evaluation from
Python, C++, C#, BrainScript
• Evaluate models with Python, C++, C#, R and BrainScript
• Automatic shape inference based on your data

CNTK – Layers Library
Simple 1-layer hidden layer model – function Dense()

Alternative Sequential()
2011-style feed-forward speech-recognition network
with 6 hidden sigmoid layers of identical dimensions

Multiple Layers

Single-Multiple Layers Recurrence Network
Recurrence (step_function, go_backwards=default_override_or(False),
initial_state=default_override_or(0), return_full_state=False, name=‘’)
RecurrenceFrom(step_function, go_backwards=default_override_or(False),
return_full_state=False, name=‘’)
Fold(folder_function, go_backwards=default_override_or(False),
initial_state=default_override_or(0), return_full_state=False, name=‘’)
UnfoldFrom(generator_function, until_predicate=None, length_increase=1, name='')

CNTK – Convolutional Neural Networks

CNTK – CNN
MaxPooling(), AveragePooling()
GlobalMaxPooling(), GlobalAveragePooling()

CNTK – Convolutional Neural Networks
Example CNN

Long-Short-Term-Memory, Gated Recurrent Unit or Recurrence Neural Netowrk
LSTM(shape, cell_shape=None, activation=default_override_or(tanh),
use_peepholes=default_override_or(False),
init=default_override_or(glorot_uniform()), init_bias=default_override_or(0),
enable_self_stabilization=default_override_or(False),
name=‘’)
GRU(shape, cell_shape=None, activation=default_override_or(tanh),
init=default_override_or(glorot_uniform()), init_bias=default_override_or(0),
name=‘’)
RNNStep(shape, cell_shape=None, activation=default_override_or(sigmoid),
init=default_override_or(glorot_uniform()),
init_bias=default_override_or(0),
name='')

Example Reccurent LSTM

Functions to create layers for batch normalization, layer normalization, and self-
stabilization
BatchNormalization(map_rank=default_override_or(None),
input init_scale=1,
normalization_time_constant=default_override_or(5000),
blend_time_constant=0, epsilon=default_override_or(0.00001),
use_cntk_engine=default_override_or(False),name=‘’)
LayerNormalization(initial_scale=1, initial_bias=0,
epsilon=default_override_or(0.00001), name=‘’)
Stabilizer(steepness=4,
enable_self_stabilization=default_override_or(True), name='')

6161
Łukasz Grala, Microsoft MVP
CEO, Data Architect
Lukasz.grala@cs.put.poznan.pl
+48 663832323
http://tidk.pl
Lukasz.grala@tidk.pl
http://dsaas.co

14-16 maja 2018, Wrocław
http://sqlday.pl
FB/sqlday

6464
Łukasz Grala, Microsoft MVP
CEO, Data Architect
Lukasz.grala@cs.put.poznan.pl
+48 663832323
http://tidk.pl
Lukasz.grala@tidk.pl
http://dsaas.co

Cognitive Toolkit - Deep Learning framework from Microsoft

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cognitive Toolkit - Deep Learning framework from Microsoft

Similar to Cognitive Toolkit - Deep Learning framework from Microsoft (20)

More from Łukasz Grala

More from Łukasz Grala (20)

Recently uploaded

Recently uploaded (20)

Cognitive Toolkit - Deep Learning framework from Microsoft

Editor's Notes