This was the introduction session done for the 2018 Global Azure Bootcamp to get the users started with neural networks on Azure Machine Learning Studio. This gives them the initial introduction on how to develop and write the neural networks. We started with writing LeNet architecture on Azure Machine Learning studio to identify handwritten digits and then moved on to cats and dogs.
This was also the presented in the first workshop of my meetup
Microsoft Ai, ML Community which can be reached here
https://www.meetup.com/Microsoft-AI-ML-Community/
8. How does machine learning help?
There are only 5 questions that machine learning can help answer
Source: Data Science For Beginners - 5 Questions Data Science Answers by Brandon Rohrer
9. 1. Is this A or B?
Real-Time Human Pose Recognition in Parts from a Single Depth Image
10. 2. Is this Weird?
Is this Weid?
Anomaly detection algorithms
11. 3. How much? How many?
How many?
How much?
Regression algorithms
12. 4. How is this organized?
How is this organized?
Clustering algorithms
13. 5. What should I do now?
What should I do now?
Reinforcement learning algorithms
15. How does it work
Algorithm
Your data
Computer
Your answer
Recipe
Ingredients
Blender
Smoothie
16. 1. Define & initialise a model
2. Train model (process cases)
3. Validate model
…by scoring (making predictions) a test data set and evaluating the results
4. Use it: Explore or Deploy
…visualise and study
…deploy as a (web) service
5. Update and revalidate
How?
25. Perceptron
X1
X2
X3
W11
W21
W31
W = Weight is the strength of the connection between nodes
b
Summation = w11 * X1 + w21 * X2 + w31 * X3 + b
Output = 0 if summation <= Threshold
= 1 if summation > Threshold
OutputAct
between X values -2 to 2, Y values are very steep. Which means, any small changes in the values of X in that region will cause values of Y to change significantly. So this function has a tendency to bring the Y values to either end of the curve.
If you notice, towards either end of the sigmoid function, the Y values tend to respond very less to changes in X. What does that mean? The gradient at that region is going to be small. It gives rise to a problem of “vanishing gradients”. Hmm. So what happens when the activations reach near the “near-horizontal” part of the curve on either sides?
Gradient is small or has vanished ( cannot make significant change because of the extremely small value ). The network refuses to learn further or is drastically slow ( depending on use case and until gradient /computation gets hit by floating point value limits ).