V2.0 open power ai virtual university deep learning and ai introduction

Deep Learning and
Artificial Intelligence
OpenPOWER
Academia team

Machine Learning Basics
Machine learning is a field of computer science that gives
computers the ability to learn without being explicitly
programmed
Methods that can learn from and make predictions on data
Labeled
Data
Labeled
Data
Machine
Learning
algorithm
Learned
model
Prediction
Training
Predictio
n

Types of Learning
01 Supervised: Learning with a labeled training set
Example: email classification with already labeled
emails
02 Unsupervised: Discover patterns in unlabeled data
Example: cluster similar documents based on text
03 Reinforcement learning: learn to act based on
feedback/reward
Example: learn to play Go, reward: win or lose
Regression
class A
class B
Classification Clustering

Machine Learning vs Deep Learning
● Most machine learning methods work well because of human-designed
representations and input features
● ML becomes just optimizing weights to best make a final prediction.
● Thus, Machine learning uses algorithms to parse data, learn from that
data, and make informed decisions based on what it has learned.
Deep learning structures algorithms in layers to create an “artificial
neural network” that can learn and make intelligent decisions on its
own.
● In practical terms, deep learning is just a subset of machine learning. It
technically is machine learning and functions in a similar way (hence
why the terms are sometimes loosely interchanged), but its capabilities
are different.

Machine Learning vs Deep Learning
https://www.xenonstack.com/blog/static/public/uploads/media/machine-learning-vs-deep-learning.png

What is Deep Learning?
● Deep learning is a subset of machine learning in Artificial Intelligence
(AI) that has networks capable of learning unsupervised from data that
is unstructured or unlabeled. Also known as Deep Neural Learning or
Deep Neural Network.
In Simple Terms,
● Deep learning algorithms attempt to learn (multiple levels of)
representation by using a hierarchy of multiple layers
● If you provide the system tons of information, it begins to understand it
and respond in useful ways.

Why Deep Learning is useful?
● Manually designed features are often over-specified,
incomplete and take a long time to design and validate
● Learned Features are easy to adapt, fast to learn
● Deep learning provides a very flexible, universal, learnable
framework for representing world, visual and linguistic
information.
● Can learn both unsupervised and supervised
● Effective end-to-end joint system learning
● Utilize large amounts of training data

Introduction to Neural Networks
● A deep neural network consists of a hierarchy of layers, whereby
each layer transforms the input data into more abstract
representations (e.g. edge -> nose -> face).
The output layer combines those features to make predictions.
● It consists of one input, one output and multiple fully-connected
hidden layers in between.
Each layer is represented as a series of neurons and
progressively extracts higher and higher-level features of the input
until the final layer essentially makes a decision about what the
input shows.
The more layers the network has, the higher level features it will
learn.

What is a neuron?
● Neurons are trained to filter and detect specific features or
patterns (e.g. edge, nose) by receiving weighted input,
transforming it with the activation function and passing it to the
● An artificial neuron contains an activation function and has several
incoming and outgoing weighted connections.

Activation Functions
● They define the output of that node given an input or set of inputs
● They decide whether a neuron should be activated or not.
● The Activation Functions can be basically divided into 2 types-
● Linear Activation Function
Identity Activation Function
● Non-linear Activation Functions
Sigmoid or Logistic Activation Function
Tanh or hyperbolic tangent Activation Function
ReLU (Rectified Linear Unit) Activation Function
Leaky ReLU

Types of Activation Functions
● Linear Activation Function
1. Identity Activation Function
● Non-linear Activation Functions
1. Sigmoid or Logistic Activation Function
2. Tanh or hyperbolic tangent Activation Function
3. ReLU (Rectified Linear Unit) Activation Function
4. Leaky ReLU

Linear Activation Function - Identity Function
● The output of the functions will not
be confined between any range.
● Equation : f(x) = x
● Range : (-infinity to infinity)
● It doesn’t help with the complexity
or various parameters of usual
data that is fed to the neural
networks. Identity Activation Function

Non-Linear Activation Function - Sigmoid
● It exists between (0 to 1).
● Used for models where we have
to predict the probability as an
output, sigmoid is the right choice.
● The function is differentiable, we
can find the slope of the sigmoid
curve at any two points.
● The function is monotonic but
function’s derivative is not.
Sigmoid Activation Function

Non-Linear Activation Function - Tanh
● The range of the tanh function is
from (-1 to 1). Tanh is also sigmoidal
● Negative inputs will be mapped
strongly negative and the zero
inputs will be mapped near zero in
the tanh graph.
● It is differentiable and is monotonic
while its derivative is not monotonic.
● The tanh function is mainly used
classification between two classes
● The function is monotonic but
function’s derivative is not. Hyperbolic Tangent Activation Function

Non-Linear Activation Function - ReLU (Rectified
Linear Unit)
● The ReLU is half rectified (from
bottom). f(z) is zero when z is less
than zero and f(z) is equal to z when
z is above or equal to zero.
● Range: [ 0 to infinity)
● The function and its derivative both
are monotonic.
● Issue: Negative values become
zero immediately which decreases
the ability of the model to fit or train
from the data properly
ReLU Activation Function

Non-Linear Activation Function - Leaky ReLU
(Rectified Linear Unit)
● The leak helps to increase the range
of the ReLU function. Usually, the
value of a is 0.01 or so.
● When a is not 0.01 then it is called
Randomized ReLU.
● Range: (-infinity to infinity).
● Both Leaky and Randomized ReLU
functions are monotonic in nature.
Also, their derivatives also
monotonic in nature.
ReLU vs Leaky ReLU Activation Function

Summary of Activation Functions - Cheetsheet

Overfitting
● Learned hypothesis may fit the
training data very well, including the
outliers (noise) but may thus fail to
generalize to new examples (test
data)
● How to overcome this:
1. Use of validation data
2. Regularization:
Dropout Layers

Regularization - Dropout
Dropout
● Randomly drop units (along with
their connections) during
training.
● Each unit retained with fixed
probability p, independent of
other units.
● Hyper-parameter p to be chosen
(tuned)
Dropout
Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural
networks from overfitting." Journal of machine learning research (2014)

Regularization - L2 weight decay and
Early-Stopping
L2 = weight decay
● Regularization term that penalizes
big weights, added to the objective
● Weight decay value determines how
dominant regularization is during
gradient computation
● Big weight decay coefficient → big
penalty for big weights
Early-stopping
● Use validation error to decide when
to stop training
● Stop when monitored quantity has
not improved after ‘n’ subsequent
epochs
● Where, ‘n’ is called patience

Convolutional Neural Network
● Convolutional neural networks form a subclass of
feedforward neural networks that have special weight
constraints, individual neurons are tiled in such a way that
they respond to overlapping regions.
ConvNet has two parts:
● Feature learning (Conv, Relu,and Pool)
● Classification(FC and softmax)

Convolutional Neural Network -
Structure
Image Reference: https://www.analyticsvidhya.com/blog/2017/06/architecture-of-convolutional-neural-networks-simplified-
demystified/

Convolutional Layers
The objective of a Convolutional layer is to extract
features of the input volume.
SOME COMMON TERMS:
● Filter, Kernel, or Feature Detector: is a small matrix
used for features detection.
● Convolved Feature, Activation Map or Feature Map: is
the output volume formed by sliding the filter over the
image and computing the dot product.
● Receptive field is a local region of the input volume that
has the same size as the filter.

Convolutional Layers
● Depth is the number of filters.
● Stride has the objective of producing smaller output
volumes spatially.
● Zero-padding adds zeros around the outside of the input
volume so that the convolutions end up with the same
number of outputs as inputs. If we don’t use padding the
information at the borders will be lost after each Conv
layer

Convolutional Neural Network - Working
Reference: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Input matrix Convolutional
3x3 filter

WORKING OF A CNN
Reference: https://cdn-images-
1.medium.com/max/800/1*_34EtrgYk6cQxlJ2br51HQ.gif

Pooling Layers
Reference:
https://shafeentejani.github.io/assets/images/pooling.gif
● Pool Layer performs a function to
reduce the spatial dimensions of the
input, and the computational
complexity of our model.
● It also controls overfitting. It
operates independently on every
depth slice of the input.
● There are different functions such as
Max Pooling, Average
Pooling, or L2-Norm Pooling.
● Max pooling (most used) only takes
the most important part (the value of
the brightest pixel) of the input.
Example-Max Pooling

Fully Connected (FC) Layers
● Fully_Connected Layer (FC):
Fully connected layers connect
every neuron in one layer to every
neuron in another layer.
● The last fully-connected layer may
use a softmax activation function for
classifying the generated features of
the input image into various classes
based on the training dataset. Fully Connected Layer

PoC : Scalable Implementation of Blood Cells
Classification Using Deep Learning
- Aditya Mitkari, Sayali Deshpande, Saurabh Kshirsagar
The existing method for blood cell classification may
contribute to inaccuracy, inconsistency and poor reliability
diagnosis that may lead to false diagnosis situation.
Thus it is a need to provide a fast, simple and efficient,
rotational and scale invariant blood cell identification system
which can be used in automating laboratory reporting.
Potential scenarios
● Accurate and faster blood cell analysis
● Real time diagnosis of patient condition

PowerAI
Training Data
from Kaggle
Predicted
Output
CNN
layers
Convolutio
nal Layer
RELU
Fully
Connected
Layers
4
classes
CNN
layers
Dropout
layers
Data
Preprocessing RELU
Processed
Data

The existing method for blood cell classification may
contribute to inaccuracy, inconsistency and poor reliability
diagnosis that may lead to false diagnosis situation.
Thus it is a need to provide a fast, simple and efficient,
rotational and scale invariant blood cell identification system
which can be used in automating laboratory reporting.
Potential scenarios
● Accurate and faster blood cell analysis
● Real time diagnosis of patient condition

PoC : Results
With the currently used
dataset we have achieved:
● A testing accuracy of
58%
● loss of 0.25 after
training for 5000 steps.
FUTURE SCOPE
By using a larger dataset with more
number of data points that reflect the
variance of the data we can improve the
model’s accuracy. We also plan to
implement our model for classification of
cell types other than white blood cells.

OpenPOWER Foundation and OpenPOWER
Academia
OpenPOWER Foundation is a consortium formed by companies including
Google, IBM , Nvidia , Mellanox and Tyan revolving around Power
Architecture.
It is an open technical community based on the POWER architecture,
enabling collaborative development and opportunity for member
differentiation and industry growth.
It is the intent of OpenPOWER Foundation to:
● Opening the POWER architecture to give the industry the ability to
innovate across the full Hardware and Software stack

OpenPOWER Academia
● The OpenPOWER Academia Discussion Group (ADG) serves as a
platform within the OpenPOWER Foundation for the academic
members.
● It comprises a broad range of academic institutions with strong
competence, e.g., in providing and operating high-performance
computing facilities or in developing scientific applications, scalable
numerical methods, new approaches for processing extreme scale data
or new methods for data analytics.
● The main motive is of OpenPOWER Academia is to encourage and
implement solutions to solve real world problems.

System Overview - OpenPOWER and
IBMPowerAI
OpenPOWER™ architecture as a way to innovate and evangelize the POWER
technologies.
IBMPowerAI software platform for deep learning with IBM® Power
Systems™, for rapid deployment of a fully optimized and supported platform
for AI with blazing performance.
IBM® Power Systems™ S822LC (Minsky)
● 128 threads, 4 x NVIDIA P100 GPUs with NVLink
● PushToCompute™ for compiling POWER8 applications and deploying
directly to the Nimbix Cloud

System Details
POWER9 – Premier Acceleration Platform
o State of the Art I/O and Acceleration Attachment Signaling
○ PCIe Gen 4 (16G) x 48 lanes
○ 192 GB/s duplex bandwidth
○ 25G Link x 48 lanes
○ 300 GB/s duplex bandwidth
o Robust Accelerated Compute Options with OPEN standards
○ On-Chip Acceleration
○ Gzip x1, 842 Compression x2, AES/SHA x2
○ CAPI 2.0
○ OpenCAPI 3.0 – High bandwidth, low latency and open interface
using 25G Link
○ NVLink 2.0 – Next generation of GPU/CPU bandwidth and
integration

V2.0 open power ai virtual university deep learning and ai introduction

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to V2.0 open power ai virtual university deep learning and ai introduction

Similar to V2.0 open power ai virtual university deep learning and ai introduction (20)

More from Ganesan Narayanasamy

More from Ganesan Narayanasamy (20)

Recently uploaded

Recently uploaded (20)

V2.0 open power ai virtual university deep learning and ai introduction

Editor's Notes