4. 1 Machine Learning
◂ Machine learning is the scientific study of algorithms and statistical models
that computer systems use to perform a specific task without using explicit
instructions, relying on patterns and inference instead. It is seen as a subset
of artificial intelligence.
5. 2 Ingredients for TRAINING ML ALGORITHM
Data
Model
Objective function
Optimization Algorithm
6. Data
◂ First we must prepare a certain amount of data to train with. Usually
this is historical data which is readily available.
7. Model
◂ The simplest model we need to train is a linear model.
◂ In weather forecast problem that would mean to find some coefficients multiply
each variable with them and sum everything to get the output .
8. Objective Function
DATA MODEL OUTPUT
fed obtain
We want the output as close to reality as possible. That's where the objective function comes in, it estimates how correct the model outputs are?
Here out goal is to minimize the objective function or error
9. Optimization Algorithm
◂ It consists the mechanics through which we vary the parameters of the
model to optimize the objective function.
10.
11. 3 Types of ML Algorithms
◂ Supervised Learning
◂ Unsupervised Learning
◂ Reinforcement learning
12. 3.1 Supervised Learning
◂ Learning with a labeled training set. Starting from the analysis of a known training
dataset, the learning algorithm produces an inferred function to make predictions
about the output values. The system is able to provide targets for any new input
after sufficient training.
◂ Example: email classification and tea making trained on labeled data.
13. 3.2 Unsupervised Learning
◂ Unsupervised learning studies how systems can infer a function to describe
a hidden structure from unlabeled data. The system doesn’t figure out the
right output, but it explores the data and can draw inferences from
datasets, that’s called clustering.
◂ Example: House Price and animal classifier
14. 3.3 Reinforcement learning
◂ Interacts with its environment by producing actions and discovers errors or
rewards. This method allows machines and software agents to
automatically maximize its performance.
◂ Example: learn to play Go, reward: win or lose
16. 4 Deep Learning (DL) ?
◂ Deep learning is a machine learning technique that teaches computers to do what
comes naturally to humans: Visual , text sound- learn by example.
◂ Deep learning algorithms attempt to learn (multiple levels of) representation by
using a hierarchy of multiple layers, learning can be supervised, semi-supervised or
unsupervised.
◂ If you provide the system tons of information, it begins to understand it and
respond in useful ways.
17. 4.1 Why is DL useful?
Manually designed features are often over-specified, incomplete and take a long time
to design and validate
Learned Features are easy to adapt, fast to learn
Utilize large amounts of training data
In ~2010 DL started outperforming other ML techniques
first in speech and vision, then NLP
18. 4.2 Application
◂ DL is a key technology behind Drive Less cars. It is the key to voice control in
consumer devices like phones, tablets, TVs, and hands-free speakers.
◂ Medical Research
◂ Several big improvements in recent years in NLP
Machine Translation
Sentiment Analysis
Dialogue Agents
Question Answering
Text Classification …
21. 5.2 Packages
A third-party package used for computations. Allows us
to work with multi dimension arrays.
2D plotting package, especially designed for visualizing
Python and NumPy computations.
Machine Learning, especially deep learning.
Features various algorithms like support vector machine, random
forests, and k-neighbors, and it also supports Python numerical and
scientific libraries like NumPy and SciPy
Alternative
23. 5.4 Vanishing Gradient
◂ Each of the neural networks weights receives an update proportion to partial
derivation of error function with respect to the current weight in each iteration
of training.
26. 6 Activation Functions
◂ A Neural Network without Activation function would simply be a Linear
regression Model, which is limited in complexity and less power to learn to
learn complex functional mapping such as images, videos , audio , speech
etc.
27. 6.1 Sigmoid (Logistic Function)
◂ A sigmoid activation squishes values between 0 and 1. That is helpful to
update or forget data because any number getting multiplied by 0 is 0, causing
values to disappears or be “forgotten.” Any number multiplied by 1 is the
same value therefore that value stay’s the same or is “kept.” The network can
learn which data is not important therefore can be forgotten or which data is
important to keep.
28. 6.2 Activation: Tanh
◂ The tanh activation is used to help regulate the values flowing through the
network. The tanh function squishes values to always be between -1 and 1.
29. 6.3 ReLU
◂ Takes a real-valued number and thresholds it at zero.
◂ Used within hidden layers for outside use softmax
◂ Prevents the gradient vanishing problem
30. 6.4 Softmax
◂ Is a function that takes as input a vector of K real numbers, and normalizes
it into a probability distribution consisting of K probabilities proportional to
the exponentials of the input numbers
32. 7.1 Feed Forward Neural network
◂ In feed forward Neural network information flows in only forward direction,
from input to nodes, through the hidden layers
33. 7.2 Recurrent NN
◂ Recurrent Neural Networks, or RNNs, were designed to work with
sequence prediction problems rather than local features.
◂ Sequence prediction problems come in many forms and are best
described by the types of inputs and outputs supported.
Memorizes time
series input
Handle sequential data
Consider
Consider all inputs
34. Use RNNs For:
◂ Text data
◂ Speech data
◂ Classification prediction problems
◂ Regression prediction problems
◂ Machine Translation
Don’t Use RNNs For:
◂ Tabular data
◂ Image data
35. 7.3 Convolutional NN
◂ CNNs, were designed to map image data to an output variable. They have ability
to develop an internal representation of a two-dimensional image.
◂ The CNN input is traditionally two-dimensional, a field or matrix, but can also be
changed to be one-dimensional, allowing it to develop an internal
representation of a one-dimensional sequence. This allows the CNN to be used
more generally on other types of data that has a spatial relationship.
◂ For example: There is an order relationship between words in a document of
text. There is an ordered relationship in the time steps of a time series.
36. Use CNNs For:
◂ Image data
◂ Classification prediction problems
◂ Regression prediction problems
Try CNNs On:
◂ Text data
◂ Time series data
◂ Sequence input data
37. Application Example:
IMDB Movie reviews sentiment classification
◂ https://uofi.box.com/v/cs510DL
50 K Reviews
25K Positive 25K Negative
39. Possible Question?
• When to use Supervised Learning?
• Which Algorithm is best for time series dependent solutions?
• What is 10 fold validation?
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan
University | School of Computer Science
42. Muhammad Usman Akhtar
Ph.D. Scholar, School Of Computer Science
Wuhan University, Wuhan, China
DEEP LEARNING
UNDERSTANDING FUNDAMENTALS
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan
University | School of Computer Science
Editor's Notes
W1 and w2 are the parameters that will change for each set of parameters we will compute the objective function, then we will choose the model with highest predictive ower.
In supervised learning we provide algorithm with inputs and their corresponding desired output based on the information based on this information it learns how to produce outputs closer to the ones we are looking for
Some times we don’t have time or resources to label the whole dataset.
Discover patterns in unlabeled data
Means that we don’t tell the algorithm what our goal is.. Instead we ask it to find some sort of dependence or under lying logic in the data provided.
learn to act based on feedback/reward
Now Remember we combined these inputs linearly and then add non linearality. How can we do that?
Inputs are X and to join them linearly we need weights. In this example the weights are 3*4 matrix 1*3 * 1*4=1*4 we will get vector of 1 * 9 matrix
Non linearities don’t change the shape of the expression. Just its linearity.
Tensor flow is leading library for neural networks (Deep NN, ConV NN, Rec NN) released by google in 2015. as google is king of Machine Learning
For k mean and random forest SkLearn ot Machine Learning is a better choice.
Sigmoid neurons saturate and kill gradients, thus NN will barely learn
when the neuron’s activation are 0 or 1 (saturate)
+ gradient at these regions almost zero
+ almost no signal will flow to its weights
+ if initial weights are too large then most neurons would saturate
especially used for models where we have to predict the probability as an output. Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.
Like sigmoid, tanh neurons saturate
Unlike sigmoid, output is zero-centered
Tanh is a scaled sigmoid: tanh 𝑥 =2𝑠𝑖𝑔𝑚 2𝑥 −1
Most Deep Networks use ReLU nowadays
Range: [ 0 to infinity)
+ Trains much faster
accelerates the convergence of SGD
due to linear, non-saturating form
+ Less expensive operations
implemented by simply thresholding a matrix at zero
+ More expressive
problem with ReLu is that some gradients can be fragile during training and can die. It can cause a weight update which will makes it never activate on any data point again. Simply saying that ReLu could result in Dead Neurons.
To fix this problem another modification was introduced called Leaky ReLu to fix the problem of dying neurons. It introduces a small slope to keep the updates alive.
Softmax function outputs a vector that represents the probability distributions of a list of potential outcomes.