This document contains an icebreaker game to help people get to know each other. It includes 7 questions with scrambled letters that must be unscrambled in 20 seconds. The questions cover topics related to artificial intelligence and machine learning like neural networks, machine learning, optimization, linear models, artificial intelligence, logistic regression, and functions.
7. Rules:
- The game includes 1 small example and 7 questions
- Based on the hints given, including pictures, scrambled letters, you will
have 20 seconds to come up with the correct word
- The person with the earliest correct answers will get 1 point.
24. Basic idea of Univariate LR:
Given some data points (x, y).
We find the red line y = wx + b that best describe / fit the data!
Index x y
0 1 2
1 2 4
2 3 6
3 4 8
4 5 10
5 6 12
6 7 14
7 8 16
8 9 18
25. We define a function with respect to w and b, called the loss function L, for
instance
The best fit line is the line with minimum value loss function!
What does it mean by “best fit”?
It’s just an optimization problem
36. - Step 1: Collect the relevant data (the data points)
- Step 2: Choose a suitable model (linear regression for example)
- Step 3: Choose a loss function with respect to the data (such as SSE)
- Step 4: Minimize the loss function with respect to the parameters using an
optimization algorithm (Gradient Descent)
Let’s backtrack
There are 4 steps to build a machine learning model
40. Sometime we want the output of the function to be:
- Strictly increasing and analytical
- Bounded in (0, 1) to be probabilistic
The sigmoid function is a perfect fit for both of these
Sigmoid function, an activation function
Yet another function to learn, huh?
41. In logistic regression, we apply sigmoid function on top of linear regression in
order to squeeze output range into (0, 1)
Logistic regression idea
Basically, that’s sigmoid(linear regression)
42. The choice of loss function
Gradient descent could only reach local minima
for non-convex functions
Since the label of logistic problem can be only be binary, we can be a little bit
smarter
What is wrong with sum of squares?
43. Formula of binary cross entropy
In short, by using this loss function, the loss function is now convex.
Binary cross entropy
Note: Multiclass variant of this is cross entropy
45. Optimization?
Sadly, there is no closed-form formula for Logistic Regression, here
is the gradient for univariate case, good luck solving those :)
Thus the only choice here is Gradient Descent.
47. Logistic regression can be used to infer whether a feature is active of not
according to the given features, by stacking up multiple logistic regressions, we
get the fundamental idea of neural network
Insight of Logistic Regression
Motivation to neural network
52. “Later Perceptrons will be able to recognize people
and call out their names and instantly translate
speech in one language to speech or writing in
another language, it was predicted.”
NY Times ━ "New Navy Device Learns By Doing" 7/7/1958
64. “Data scientists call the layer-by-layer process of
matrix multiplication followed by non-linear
activation functions, transforming the feature
space.”
Alando Ballantyne ━ Minsky's "And / Or" Theorem: A Single Perceptron Limitations.
65. Step 1: Determine the network structure (the number of layers, which activation
functions to choose).
Step 2: Load the data as input to the Neural Network
Repeat {
Step 3: Forward the data through the network, calculate the Loss function.
Step 4: Backpropagation to find the weight gradients → update the weights.
} Until convergence.
Let’s recap!
Neural Network in a nutshell
68. “The result of a Neural Network can approximate any
well-behaved function 𝟋 by using the same
construction for the first layer and approximating the
identity function with later layers.”
Universal Approximation Theorem