Machine Learning
4 Dummies - Part 1
Dori Waldman - Big Data Lead
Michael Winer - Data Science Lead
Where to start ?
● Machine Learning Fundamentals.
○ Quick intro to machine learning.
○ Linear Regression - Regression (spark- scala )
○ Logistic Regression - Classification (spark - scala )
● Basic code examples of a neural network.
○ Neural network example using numpy
○ Deep Learning using Keras ( TensorFlow Backend )
** Our focus is not to find the best model , but to explain the ML building blocks
and combine both theoretical and practise
** we will use the same problem and solve it with ML (scala) and DL (python)
Session 1
Agenda
● More ML Algorithms :
○ Decision Tree
○ Random Forest
○ ALS Recommendation
○ ...
● Image Classification Using Deep Learning.
○ Theory behind convolutional neural network.
○ Convolutional neural network example.
● Go to production.
Next
Session
ML
FUNDAMENTALS
What Is Machine Learning ?
● Let’s say we would like to know if
tomorrow will be a good day to play
outside or not.
● We have some data about the past.
● We would like to have a model that will
predict whether we will play the next
day given outlook, temperature,
humidity and widny.
Classic approach - Rule Based ( Deterministic )
The rules:
If (outlook = ‘sunny’ and tempature <> ‘hot’)
{ Play = ‘yes’ }
else if ( outlook = ‘rainy’ and windy = true)
{ Play = ‘no’ }
There may be surprises..
● It's all about statistics. there is no deterministic answer.
● Instead of rules , the model needs to find the correct weights per feature
● The model (in most cases) examines various examples and get feedback on the
outcome, the feedback is being used to adjust the model outcome on each iteration.
● In each iteration the model tries to reduce the loss/error
ML Approach ( Probabilistic )
ML - It Is all about weights
Supervised Vs Unsupervised Learning
Unsupervised
● There is no correct answer.
● Goal is to find underline relations in the data
(split books to categories)
when to use UnSupervised learning:
● Clustering: discover inherent groups , such
as grouping customers by purchasing
behavior .
● Recommendation: discover hidden rules,
such as people that buy X will tend to buy Y .
Supervised
● For each example we have the correct
answer/label.
● We use the correct label during model training
and evaluation.
When to use Supervised learning:
● Classification: output is a category such as
‘male’ or ‘female’
● Regression: output is a numeric value (house
price)
● Regression (How much ?):
○ predict price house according to last year sales data
● Classification (Which class it belongs to ?):
○ Yes/No questions
● Clustering (split data to related groups)
○ Find hidden correlations between data
○ Split books to groups.
○ Principal component analysis. ?
● Recommendation
○ Item similarity ( Hammer and Nail are similar )
○ User similarity ( People with same taste like me)
● Deep Learning ( Neural network )
○ In addition to classic prediction it also support Image analysis (CNN) , LSTM , GANS
○ Reinforced Learning
ML Types:
ML Hint:
Machine Learning Use Cases
Machine Learning Pipeline
● Deploy model
● Monitor model impact /feedback
● Update Model every hour/day
● Convert data
● Clean data
● Feature selection
● Split data (train/test)
model selection
model tuning
Let’s Learn!
Linear
Regression
● We are looking for linear function (Y=X0+ W1X1 +W2X2…+AnXn) that will be closest to most
of the data points.
● The distance between the line and the dot is the “error” between the correct result and the
predicted value.
Linear Regression - Definition
● Continuous prediction is needed. F.E : Income estimation , Size , Price.
● The relationship between the variables is linear. ( Old apartment example )
● Computational Efficiency is an important parameter.
● There is no dependency between features. ( see also Covariance )
Linear Regression - when to use
Advantages of using Linear Regression
● Easy to explain
● High performance ( Regressions are considered robust )
Drawbacks of using Linear Regression
● sensitive to outliers.
● Not suitable for nonlinear data
Linear Regression - Pros/Cons
Which model works better ?
Linear Regression - Measure the model
Model with lower error
(RMSE in many cases)
Links to Measurement discussion
predict house prices for next year based on last year prices.
Let’s Code - House Data
Taken from KAGGLE ( Hosts many data competitions ):
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
Let’s Code - House Data
https://www.slideshare.net/HadoopSummit/mleap-release-spark-ml-pipelines
Handle string input
One Hot
Encoder
https://www.kaggle.com/dansbecker/using-categorical-data-with-one-hot-encoding
Let’s code
Let’s Code - House Data
Let’s Code - House Data
● maxIter : Max Number of Iterations
● intercept : A numeric addition to regression line.
● regParam : Regularization multiplier for overfitting prevention.
● elasticNetParam : L1 Vs. L2 ( Regularization method )
● Standardization: Scales numbers based on -> ( xi- avg(x) ) / sd(x)
○ normalize the features
Linear Regression - Tune
Common
Challenges
&
Techniques
● Overfitting means we trained the model on the data too well.
In some cases , the model remember the values and is not learning the
pattern behind the data.
● Overfitting is more likely with nonlinear models that have more complexity.
Overfitting
● Techniques to avoid overfitting : Reduce #features , Regularization,
train-test-validation , Limit algorithm’s complexity , Early stopping, Dropout..
Overfitting:
detect &
avoid
Cross
validation-
Can be very
useful when
dataset is
small
Hyper
Parameter
Tuning
● ML algorithms has hyperparameters.
● It is hard to guess the best combination of hyperparameters .
● It is recommended to search the hyper-parameter
options for the best score.
● Recommended method : Parameters Grid .
● Until now : In each iteration the model try to reduce the error
( Like RMSE in linearRegression).
● The Regularization functions as an addition to the loss function
of the model, Punishes high coefficients and weights.
Regularization
Why use it ?
Weights example : W1 = 0.2 , W2=0.4 , w3=0.8
W3 will contribute the most to the model’s complexity.
Regularization rate - What to choose?
● High value will make model simple ( underfit danger )
● low value keeps model complex ( overfit danger )
Regularization Method - Which one to choose?
● L2 : Weights will have 0 center , small , normal distributed .
Good for preventing overfitting of models.
● L1 : Set some of the weights to ‘0’ to simplify model
complexity. Good for Feature selection & Evaluation
Regularization
How to use?
https://towardsdatascience.com/l1-and-l
2-regularization-methods-ce25e7fc831c
● Feature selection / Reduction :
○ Removing noisy features will reduce calculation efforts and may
return more accurate results (PCA)
○ The big question : Which features should stay and which will not.
● Data Clean & Pre-process ( Most Important ! ) :
○ Normalize data. ( feature scale : #rooms , price house)
○ Transform strings in order to handle categorical data.
○ Handle unbalanced data.
○ Generate more features ( Cross features like in location, size)
● Algorithm selection & Evaluation
○ Model/s Selection & analysis
○ Accuracy check and tune selected model
○ Avoid Under/Over fit of the model
Other
Challenges
Data Preparation Trick : Bucketize
Bucketize
By using Buckets technique we might reduce RMSE significantly.
Buckets means arrange data to groups like:
● Group 1 :1-3 #rooms
● Group 2 : 4-6 #rooms
● Group 3 : >7 #rooms
Taken from Google course (house price prediction)
Know your data : Outliers Inspection
Know your data - Perform analysis on your data
Logistic
Regression
● Probability Estimator.
● Binary Logistic regression predicts the probability that an observation falls into
one of two categories (Classification).
● Examples : Male or Female , Yes/No.
Logistic Regression
Logistic Regression - Definition
http://www.saedsayad.com/logistic_regression.htm
● Classification is needed. ( Male / Female , Yes / No )
● The regression is Robust & Easy to explain
● There is low dependency between features ( Covariance ) .
Logistic Regression
Logistic Regression - when to use
Logistic Regression- Advanced measurementLogistic Regression :
Measure the model - Confusion Matrix
Confusion matrix consists 4 values:
TP: model predicts correctly the positive class.
TN: model predicts the correctly negative class.
FP: model predicts the positive class and mistakes.
FN: model predicts the negative class and mistakes.
If the goal of linear regression was to predict continues number like house price
according to history data, the goal of logistic regression is to predict to which group A/B
each input belongs, for example is it a Male or Female.
The question is not what are the groups rather how good the model is to distinguish
between the groups meaning when it predict that the input is A its really A according to
the label (TP) and when it predict that its B (not A) its really B (TN)
If you want to know what is the meaning of A (TP) and B (TN) you need to check the
data for example if we have 100 match prediction for Male than group A (TP) is male
Logistic Regression : understand confusion matrix
● Accuracy: (TP+TN)/(TP+TN+FP+FN)
- Most common measure . Example : If patient’s data as are labeled as 90% “healthy” , we can predict in 90%
that a patient is healthy without ML. if model Accuracy is above 90% it means our model is doing something.
● Precision % positive prediction are correct → TP/(TP+FP)
- higher threshold will increase precision. This is important measure whenever you need to be certain when you
decide on a ‘Yes’ ( Hiring example )
● Recall: % actual positive has been identified →
TP / (TP+FN) -sometimes we will need lower threshold for
positive . F.E : Better to send healthy person to treatment
and not miss a sick person.
https://developers.google.com/machine-learning/crash-course/classification/accuracy
https://developers.google.com/machine-learning/crash-course/classification/prediction-bias
Logistic Regression : How to measure
Logistic Regression- Advanced measurementLogistic Regression :
How to set the threshold - ROC Curve
l Please watch link in
website our website !
Let’s code
We are going to predict whether the house has air conditioner or not.
Let’s Code - House Data
● Threshold : probability threshold for yes/no decision
● maxIter : Max Number of Iterations
● regParam : Regularization multiplier for overfitting prevention.
● elasticNetParam : L1 Vs. L2 ( Regularization method )
● Standardization. Scales numbers based on -> ( xi- avg(x) ) / sd(x)
Logistic Regression
Logistic Regression - Tune
Deep
Learning
Machine & Deep Learning Landscape
• Deep Learning is more complicated
• Deep Learning works on “special”
tasks like image recognition , NLP
• Deep Learning does not require to
select features . However, It is still
recommended to understand data.
Deep Learning Agenda
● Theoretical Explanation of Artificial Neural Network ( Deep Learning )
● Deep Learning Steps:
○ Predict - Feed Forward
○ Calculate Errors
○ Back Propagation - Fix weights using the calculated errors.
● Code Review - Numpy
● Code Review - Keras
● Artificial neural networks consists of nodes and weights. These components
are used to extract the information from the features.
● Each layer of nodes generates an output based on the output of the previous
layer while using an activation function (chosen by us).
● Simplified : Deep learning is an artificial neural networks with several layers.
Deep Learning : Artificial Neural Network
https://www.quora.com/What-is-the-difference-between-Neural-Networks-and-Deep-Learning
The magic behind deep learning is based on matrix multiplication :
● First matrix is the inputs
● Second matrix is the weights.
Deep Learning- Math behind
https://www.mathsisfun.com/algebra/matrix-multiplying.html
● First, multiply the input nodes with their weights.
● After that, perform activation function on the 2nd layer of nodes.
● Get output from all nodes in the layer and move to the next one.
Deep Learning- Math behind
***Sigmoid function scales the value between 0-1
Deep Learning Steps
1) Initialize the network with random weights.
(Where the arrows are in the picture )
2) Forward propagation :
● Sum Inputs ( values * weights )
● Perform activation function
3) Back propagation :
● Calculate error in the output layer
● Calculate each step contribution to the error
● Fix each step weights using gradient descent.
4) Repeat steps 2-3 Till convergence
Make A Prediction : Feed Forward Recap
● Feed Forward starts by assigning all weights with random values.
● Then, We multiply the Inputs with weights using matrix multiplication.
● In each node there is an ‘activation function’ that calculates the output,
which will be the input for the next layer.
● In the output layer, We get ‘predictions’ that we can compare to the real
results and calculate the error.
Make A Prediction : Activation Functions Examples
activations
For Yes/No
questions
Recommended
for hidden
layers
Backpropagation :
Calculate The Error ( F.E : Root Mean Square Error )
● For each data point in the output layer ( y = Result , y’ = Prediction):
○ Error( y , y’ ) = ( y- y’ ) **2
● Total Errors calculation for N data points :
○ Total Errors ( y, y’ , N ) = sqrt(1/N *(sum( Error)))
Backpropagation : Gradient Descent Concept
● In general , a gradient , Calculated
based on f(x) determines the
change of Y in respect to a change
in X. this can be used to find the
direction to the minimum point of a
function.
● In the deep learning context- We
are calculating the partial derivative
in respect to the error and search
for the direction with the minimum
error for each weight (‘x’ in the
graph).
https://plus.maths.org/content/making-grade
Gradient Descent - Deeper explanation
Backpropagation : Gradient Descent problems
● Our goal is to aim for the lowest point in the error function. ( Low Y )
● Below we see an example of how classic gradient descent will be used to decide on
the direction a weight (x) should change depending on our position.
● As you can see, It’s not always that easy...
● LR is a simple yet important multiplier for the weight change pace.
Recommended values : try 0.1 - 0.0001 ( Always check other options! ) .
● LR can determine if your weights will converge to minimum or not.
Backpropagation : Adjust The weights- Learning Rate
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
● As you can see gradient descent could lead the weights into a local minimum.
● Threfore , Besides adjusting the learning rate We also recommend on trying :
Weight initialization techniques , Momentum, Learning Rate Decay , Dropout ...
Local Minimum Problem: What to do?
Round of artificial neural network using numpy
Let’s Code
Keras code example
Sales Price
Air Conditioner
https://inneractive-ondemand.bitbucket.io
Visit us

Machine Learning and Deep Learning 4 dummies

  • 1.
    Machine Learning 4 Dummies- Part 1 Dori Waldman - Big Data Lead Michael Winer - Data Science Lead
  • 2.
  • 3.
    ● Machine LearningFundamentals. ○ Quick intro to machine learning. ○ Linear Regression - Regression (spark- scala ) ○ Logistic Regression - Classification (spark - scala ) ● Basic code examples of a neural network. ○ Neural network example using numpy ○ Deep Learning using Keras ( TensorFlow Backend ) ** Our focus is not to find the best model , but to explain the ML building blocks and combine both theoretical and practise ** we will use the same problem and solve it with ML (scala) and DL (python) Session 1 Agenda
  • 4.
    ● More MLAlgorithms : ○ Decision Tree ○ Random Forest ○ ALS Recommendation ○ ... ● Image Classification Using Deep Learning. ○ Theory behind convolutional neural network. ○ Convolutional neural network example. ● Go to production. Next Session
  • 5.
  • 6.
    What Is MachineLearning ? ● Let’s say we would like to know if tomorrow will be a good day to play outside or not. ● We have some data about the past. ● We would like to have a model that will predict whether we will play the next day given outlook, temperature, humidity and widny.
  • 7.
    Classic approach -Rule Based ( Deterministic ) The rules: If (outlook = ‘sunny’ and tempature <> ‘hot’) { Play = ‘yes’ } else if ( outlook = ‘rainy’ and windy = true) { Play = ‘no’ } There may be surprises..
  • 8.
    ● It's allabout statistics. there is no deterministic answer. ● Instead of rules , the model needs to find the correct weights per feature ● The model (in most cases) examines various examples and get feedback on the outcome, the feedback is being used to adjust the model outcome on each iteration. ● In each iteration the model tries to reduce the loss/error ML Approach ( Probabilistic )
  • 9.
    ML - ItIs all about weights
  • 10.
    Supervised Vs UnsupervisedLearning Unsupervised ● There is no correct answer. ● Goal is to find underline relations in the data (split books to categories) when to use UnSupervised learning: ● Clustering: discover inherent groups , such as grouping customers by purchasing behavior . ● Recommendation: discover hidden rules, such as people that buy X will tend to buy Y . Supervised ● For each example we have the correct answer/label. ● We use the correct label during model training and evaluation. When to use Supervised learning: ● Classification: output is a category such as ‘male’ or ‘female’ ● Regression: output is a numeric value (house price)
  • 11.
    ● Regression (Howmuch ?): ○ predict price house according to last year sales data ● Classification (Which class it belongs to ?): ○ Yes/No questions ● Clustering (split data to related groups) ○ Find hidden correlations between data ○ Split books to groups. ○ Principal component analysis. ? ● Recommendation ○ Item similarity ( Hammer and Nail are similar ) ○ User similarity ( People with same taste like me) ● Deep Learning ( Neural network ) ○ In addition to classic prediction it also support Image analysis (CNN) , LSTM , GANS ○ Reinforced Learning ML Types:
  • 12.
  • 13.
  • 14.
    Machine Learning Pipeline ●Deploy model ● Monitor model impact /feedback ● Update Model every hour/day ● Convert data ● Clean data ● Feature selection ● Split data (train/test) model selection model tuning
  • 15.
  • 16.
  • 17.
    ● We arelooking for linear function (Y=X0+ W1X1 +W2X2…+AnXn) that will be closest to most of the data points. ● The distance between the line and the dot is the “error” between the correct result and the predicted value. Linear Regression - Definition
  • 18.
    ● Continuous predictionis needed. F.E : Income estimation , Size , Price. ● The relationship between the variables is linear. ( Old apartment example ) ● Computational Efficiency is an important parameter. ● There is no dependency between features. ( see also Covariance ) Linear Regression - when to use
  • 19.
    Advantages of usingLinear Regression ● Easy to explain ● High performance ( Regressions are considered robust ) Drawbacks of using Linear Regression ● sensitive to outliers. ● Not suitable for nonlinear data Linear Regression - Pros/Cons
  • 20.
    Which model worksbetter ? Linear Regression - Measure the model Model with lower error (RMSE in many cases) Links to Measurement discussion
  • 21.
    predict house pricesfor next year based on last year prices. Let’s Code - House Data Taken from KAGGLE ( Hosts many data competitions ): https://www.kaggle.com/c/house-prices-advanced-regression-techniques
  • 22.
    Let’s Code -House Data https://www.slideshare.net/HadoopSummit/mleap-release-spark-ml-pipelines
  • 23.
  • 24.
  • 25.
  • 26.
    Let’s Code -House Data
  • 27.
    ● maxIter :Max Number of Iterations ● intercept : A numeric addition to regression line. ● regParam : Regularization multiplier for overfitting prevention. ● elasticNetParam : L1 Vs. L2 ( Regularization method ) ● Standardization: Scales numbers based on -> ( xi- avg(x) ) / sd(x) ○ normalize the features Linear Regression - Tune
  • 28.
  • 29.
    ● Overfitting meanswe trained the model on the data too well. In some cases , the model remember the values and is not learning the pattern behind the data. ● Overfitting is more likely with nonlinear models that have more complexity. Overfitting
  • 30.
    ● Techniques toavoid overfitting : Reduce #features , Regularization, train-test-validation , Limit algorithm’s complexity , Early stopping, Dropout.. Overfitting: detect & avoid
  • 31.
  • 32.
    Hyper Parameter Tuning ● ML algorithmshas hyperparameters. ● It is hard to guess the best combination of hyperparameters . ● It is recommended to search the hyper-parameter options for the best score. ● Recommended method : Parameters Grid .
  • 33.
    ● Until now: In each iteration the model try to reduce the error ( Like RMSE in linearRegression). ● The Regularization functions as an addition to the loss function of the model, Punishes high coefficients and weights. Regularization Why use it ? Weights example : W1 = 0.2 , W2=0.4 , w3=0.8 W3 will contribute the most to the model’s complexity.
  • 34.
    Regularization rate -What to choose? ● High value will make model simple ( underfit danger ) ● low value keeps model complex ( overfit danger ) Regularization Method - Which one to choose? ● L2 : Weights will have 0 center , small , normal distributed . Good for preventing overfitting of models. ● L1 : Set some of the weights to ‘0’ to simplify model complexity. Good for Feature selection & Evaluation Regularization How to use? https://towardsdatascience.com/l1-and-l 2-regularization-methods-ce25e7fc831c
  • 35.
    ● Feature selection/ Reduction : ○ Removing noisy features will reduce calculation efforts and may return more accurate results (PCA) ○ The big question : Which features should stay and which will not. ● Data Clean & Pre-process ( Most Important ! ) : ○ Normalize data. ( feature scale : #rooms , price house) ○ Transform strings in order to handle categorical data. ○ Handle unbalanced data. ○ Generate more features ( Cross features like in location, size) ● Algorithm selection & Evaluation ○ Model/s Selection & analysis ○ Accuracy check and tune selected model ○ Avoid Under/Over fit of the model Other Challenges
  • 36.
    Data Preparation Trick: Bucketize Bucketize By using Buckets technique we might reduce RMSE significantly. Buckets means arrange data to groups like: ● Group 1 :1-3 #rooms ● Group 2 : 4-6 #rooms ● Group 3 : >7 #rooms Taken from Google course (house price prediction)
  • 37.
    Know your data: Outliers Inspection
  • 38.
    Know your data- Perform analysis on your data
  • 39.
  • 40.
    ● Probability Estimator. ●Binary Logistic regression predicts the probability that an observation falls into one of two categories (Classification). ● Examples : Male or Female , Yes/No. Logistic Regression Logistic Regression - Definition http://www.saedsayad.com/logistic_regression.htm
  • 41.
    ● Classification isneeded. ( Male / Female , Yes / No ) ● The regression is Robust & Easy to explain ● There is low dependency between features ( Covariance ) . Logistic Regression Logistic Regression - when to use
  • 42.
    Logistic Regression- AdvancedmeasurementLogistic Regression : Measure the model - Confusion Matrix Confusion matrix consists 4 values: TP: model predicts correctly the positive class. TN: model predicts the correctly negative class. FP: model predicts the positive class and mistakes. FN: model predicts the negative class and mistakes.
  • 43.
    If the goalof linear regression was to predict continues number like house price according to history data, the goal of logistic regression is to predict to which group A/B each input belongs, for example is it a Male or Female. The question is not what are the groups rather how good the model is to distinguish between the groups meaning when it predict that the input is A its really A according to the label (TP) and when it predict that its B (not A) its really B (TN) If you want to know what is the meaning of A (TP) and B (TN) you need to check the data for example if we have 100 match prediction for Male than group A (TP) is male Logistic Regression : understand confusion matrix
  • 44.
    ● Accuracy: (TP+TN)/(TP+TN+FP+FN) -Most common measure . Example : If patient’s data as are labeled as 90% “healthy” , we can predict in 90% that a patient is healthy without ML. if model Accuracy is above 90% it means our model is doing something. ● Precision % positive prediction are correct → TP/(TP+FP) - higher threshold will increase precision. This is important measure whenever you need to be certain when you decide on a ‘Yes’ ( Hiring example ) ● Recall: % actual positive has been identified → TP / (TP+FN) -sometimes we will need lower threshold for positive . F.E : Better to send healthy person to treatment and not miss a sick person. https://developers.google.com/machine-learning/crash-course/classification/accuracy https://developers.google.com/machine-learning/crash-course/classification/prediction-bias Logistic Regression : How to measure
  • 45.
    Logistic Regression- AdvancedmeasurementLogistic Regression : How to set the threshold - ROC Curve l Please watch link in website our website !
  • 46.
    Let’s code We aregoing to predict whether the house has air conditioner or not. Let’s Code - House Data
  • 47.
    ● Threshold :probability threshold for yes/no decision ● maxIter : Max Number of Iterations ● regParam : Regularization multiplier for overfitting prevention. ● elasticNetParam : L1 Vs. L2 ( Regularization method ) ● Standardization. Scales numbers based on -> ( xi- avg(x) ) / sd(x) Logistic Regression Logistic Regression - Tune
  • 48.
  • 49.
    Machine & DeepLearning Landscape • Deep Learning is more complicated • Deep Learning works on “special” tasks like image recognition , NLP • Deep Learning does not require to select features . However, It is still recommended to understand data.
  • 50.
    Deep Learning Agenda ●Theoretical Explanation of Artificial Neural Network ( Deep Learning ) ● Deep Learning Steps: ○ Predict - Feed Forward ○ Calculate Errors ○ Back Propagation - Fix weights using the calculated errors. ● Code Review - Numpy ● Code Review - Keras
  • 51.
    ● Artificial neuralnetworks consists of nodes and weights. These components are used to extract the information from the features. ● Each layer of nodes generates an output based on the output of the previous layer while using an activation function (chosen by us). ● Simplified : Deep learning is an artificial neural networks with several layers. Deep Learning : Artificial Neural Network https://www.quora.com/What-is-the-difference-between-Neural-Networks-and-Deep-Learning
  • 52.
    The magic behinddeep learning is based on matrix multiplication : ● First matrix is the inputs ● Second matrix is the weights. Deep Learning- Math behind https://www.mathsisfun.com/algebra/matrix-multiplying.html
  • 53.
    ● First, multiplythe input nodes with their weights. ● After that, perform activation function on the 2nd layer of nodes. ● Get output from all nodes in the layer and move to the next one. Deep Learning- Math behind ***Sigmoid function scales the value between 0-1
  • 54.
    Deep Learning Steps 1)Initialize the network with random weights. (Where the arrows are in the picture ) 2) Forward propagation : ● Sum Inputs ( values * weights ) ● Perform activation function 3) Back propagation : ● Calculate error in the output layer ● Calculate each step contribution to the error ● Fix each step weights using gradient descent. 4) Repeat steps 2-3 Till convergence
  • 55.
    Make A Prediction: Feed Forward Recap ● Feed Forward starts by assigning all weights with random values. ● Then, We multiply the Inputs with weights using matrix multiplication. ● In each node there is an ‘activation function’ that calculates the output, which will be the input for the next layer. ● In the output layer, We get ‘predictions’ that we can compare to the real results and calculate the error.
  • 56.
    Make A Prediction: Activation Functions Examples activations For Yes/No questions Recommended for hidden layers
  • 57.
    Backpropagation : Calculate TheError ( F.E : Root Mean Square Error ) ● For each data point in the output layer ( y = Result , y’ = Prediction): ○ Error( y , y’ ) = ( y- y’ ) **2 ● Total Errors calculation for N data points : ○ Total Errors ( y, y’ , N ) = sqrt(1/N *(sum( Error)))
  • 58.
    Backpropagation : GradientDescent Concept ● In general , a gradient , Calculated based on f(x) determines the change of Y in respect to a change in X. this can be used to find the direction to the minimum point of a function. ● In the deep learning context- We are calculating the partial derivative in respect to the error and search for the direction with the minimum error for each weight (‘x’ in the graph). https://plus.maths.org/content/making-grade Gradient Descent - Deeper explanation
  • 59.
    Backpropagation : GradientDescent problems ● Our goal is to aim for the lowest point in the error function. ( Low Y ) ● Below we see an example of how classic gradient descent will be used to decide on the direction a weight (x) should change depending on our position. ● As you can see, It’s not always that easy...
  • 60.
    ● LR isa simple yet important multiplier for the weight change pace. Recommended values : try 0.1 - 0.0001 ( Always check other options! ) . ● LR can determine if your weights will converge to minimum or not. Backpropagation : Adjust The weights- Learning Rate http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
  • 61.
    ● As youcan see gradient descent could lead the weights into a local minimum. ● Threfore , Besides adjusting the learning rate We also recommend on trying : Weight initialization techniques , Momentum, Learning Rate Decay , Dropout ... Local Minimum Problem: What to do?
  • 62.
    Round of artificialneural network using numpy Let’s Code
  • 63.
    Keras code example SalesPrice Air Conditioner
  • 64.