SlideShare a Scribd company logo
MLCC #4 Neural Network
Presented by Ofa

2018.7.18
Agenda
• Introduction to NN

• Backpropogation

• Training Neural Networks

• Multi-Class NN

• Embeddings

• ML Engineering
Introduction to Artificial Neural
Network
What is ANN?
• First we may need to think about what is INTELLIGENCE?
Intelligence
The Octopus https://goo.gl/eUS7nS
What is ANN?
What is ANN?
Non-linear Problems
Linear solver + linear solver = linear solver!!!
Non-linear Problems
Real Case: TCM price
Price from store(每錢)
Price from origin(每⽄斤)
ReLU
non-linear function => nonlinear model
Activation Functions
More activation functions:
https://www.tensorflow.org/api_guides/python/nn
Playground - with 1 hidden layer node
Linear activation Sigmoid activation
ReLu activation
Playground - with 2 hidden layer nodes
Linear activation Sigmoid activation
ReLu activation
#sometimes shows another result
Playground - challenge 0.177 loss
First trial Remove empty nodes
#L2 regularization is required
Playground - initialization
First trial Second trial
#DIY
Playground - Spiral
First trial Second trial
You can still only tuning the parameters to reach a good
performance rather than doing feature engineering
Playground - Spiral
First trial Second trial
It is a choice between more features and more computing
power($$$)
Programming Exercise
OK.. it’s steps and batch size that matter…
Backpropogation
Backpropogation
• How data flows through the graph.

• How dynamic programming lets us
avoid computing exponentially many
paths through the graph.
Backpropogation
• Update weights according to
the error

• wij = wij - a*dE/dwij
Backpropogation
• Starting from the output layer!

• d(1/2(youtput - y target)^2) = youtput - y target
Backpropogation
• Go backward for each node

• dE/dwij = dxj/dwij *dE/dxj 

= yi *dE/dxj 

#cuzxj = yi *wij
dE/dw46 = dx6/dw46 *dE/dx6 

= y4 *dE/dx6
–Trust me, it’s too complicated to understand
“哩喜勒勒公三⼩小”
Example
Input Layer Hidden Layer Output Layer
Bias
= f(0.3825) = 0.5944
To compute sigmoid, you can use :
https://goo.gl/Jiuw2p
Try to compute yh2, o1, o2 by yourself
Reference
Example
Input Layer Hidden Layer Output Layer
Bias
So we get:
yh1= 0.5944
yh2 = 0.5968
o1 = f(1.106) = 0.7513
o2 = f(1.225) = 0.7729
Example
Input Layer Hidden Layer Output Layer
Bias
Then we can update weights:
OutputO1 = 0.75
OutputO2 = 0.773
Etotal = EO1 + EO2
= 1/2*(0.01-0.75)^2 +
1/ 2*(.099 - 0.773)^2
= 0.74
w5new = w5old - a*dEtotal/dw5
Example
Then we can get:
w5new = w5old - a*dEtotal/dw5
so, w5new = 0.4 - 0.5 * 0.082
= 0.359
Example
Then we can get:
w5new = 0.359
w6new = 0.4
w7new = 0.51
w8new = 0.56
Next, we need to update the first
layer, ie. w1~w4
Input Layer Hidden Layer Output Layer
Bias
Example
Input Layer Hidden Layer Output Layer
Bias
#We’ve already computed in the previous layer
Then we can also get :
#
Example
Input Layer Hidden Layer Output Layer
Bias
Example
Input Layer Hidden Layer Output Layer
Bias
Then we can get:
w1new = 0.1497
w2new = 0.1995
w3new = 0.2497
w4new = 0.2995
Brief Summary
• You can do the update for all weights, just remember you need to update
all the weight together instead of one-by-one. 

• That means you should always update the weights using old data, do not
mix the old ones and new ones.

• But, I think just call nn.train() would be the best way to do it!
Training Neural Nets
• Thing to note:

• Gradients should be differentiable so we can learn from it.

• Gradients can vanish and explode: additional layer, ReLUs / learning rate, batch
normalization

• Lower level gradient may go closer to zero that makes training slow. Use
ReLU can prevent it.

• If weights are too large, they may make lower level gradients explode. Use
batch normalization to avoid it.

• ReLU layers can die: learning rate
Dropout Regularization
• Randomly dropping out units
in a network for a single
gradient step.

• Control from 0.0 to 1.0, 1.0
means drop out all nodes and
then learn nothing!

• This mechanism makes
deeplearning useful in recent
years
Programming Exercises: Normalization
This way is much simpler than the solution…
Programming Exercises: Optimizer
AdagradOptimizer:

Automatically reduce the learning rate

RMSE=122.29 / 124.10

AdamOptimizer:

Adaptive Moment Estimation,
computes adaptive learning rates for
each parameter.

RMSE= 67.67 / 67.48
Reference:
http://ruder.io/optimizing-gradient-descent/
Programming Exercises: Normalization+
You can pass the normalization into
the function options, that makes it
simpler

z_score, RMSE: 71.54 / 70.39

binary_threshold(0.5), RMSE:
115.78 / 116.41

clip(0.1, 0.8), RMSE: 115.77 / 116.33

log_normalize??? (math error)
Multi-Class NN
Multi-Class NN
One-class NN Multi-Class NN
See Food
The ‘See Food’ app from Silicon Valley really happened, and it was also a lie
“Meal Snap”
See Food
• Multi-class, single label: this
is a hotdog, octopus, or
banana

• => softmax(candidate
sampling)

• Multi-class, multi-label: this
picture contains hotdog,
cucumber, tomato, and onion

• => regression for all
Softmax
AWS Sagemaker
Ref: https://goo.gl/3HMkPR
Embeddings
Collaborative Filtering
Step 1. Preprocessing: build a dict of
all movies
Step 2. Encode the user behavior
into the sparse representation
Embeddings
Embeddings
• Embed the data into an
d-dimension plane,
which maps items to low-
dimension real vectors 

• the dimensions could be
determined by the
empirical way
hidden layers
Example
PCA
Reference:
[1] https://goo.gl/XetAUb (right)
[2] https://goo.gl/HctuRj (left, including python examples)
Word2Vec
Ray Hsueh has given an awesome talk last week!!!
ML Engineering
Production ML Systems
What we’ve learned so far…
Static vs Dynamic Training
• Static - Trained offline. For data do not change a lot overtime.

• Pros: easy to build and test, batch training, test and iterate until good

• Cons: required monitors inputs, easy to let it grow stale

• Dynamic - Trained online. 

• Pros: continue to feed data, regularly sync out updated version. Use
progressive validation rather than batch training & test. Adapt to changes.

• Cons: needs monitoring, model rollback & data quarantine capabilities
Static vs Dynamic Inference
• Static - Inference offline. For data do not change a lot overtime.

• Pros: much less computational cost

• Cons: need all the data at hand, update latency could be very long

• Dynamic - Inference online. 

• Pros: can predict the newest data

• Cons: latency is higher, you need budget to solve that
Data Dependencies
• Feature and data change makes huge impact with the model

• Unit test for data?

• Reliability: what about the input data disappears?

• Versioning: feature changes over time?

• Necessity: how useful is the feature according to its computational cost?

• Correlations: tied together or tease apart?

• Feedback loops: could my input be impacted by my output?
Real World Examples
Cancer Prediction
• Hospitals specialized with cancer treatment make the model overfitting

• => label leakage, just like a cheat
Real World Guidelines
• Keep the very first model extremely simple

• Focus on data pipeline correctness

• Use a simple, observable metric for training & evaluation

• Own and monitor your input features

• Treat your model configuration as code: review it, check it in

• Write down the results of all experiments, especially “failures"
Good Bye!
Machine Learning Practica
Check out these real-world case studies of how Google uses machine learning in its products,
with video and hands-on coding exercises:
• Image Classification: See how Google developed the image classification model powering
search in Google Photos, and then build your own image classifier.
• More Machine Learning Practica coming soon!
Other Machine Learning Resources
• Deep Learning: Advanced machine learning course on neural networks, with extensive
coverage of image and text models
• Rules of ML: Best practices for machine learning engineering
• TensorFlow.js: WebGL-accelerated, browser-based JavaScript library for training and
deploying ML models

More Related Content

Similar to Mlcc #4

backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
Akash Goel
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
Venkata Reddy Konasani
 
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 6 - Profe...
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 6 - Profe...ICPSR - Complex Systems Models in the Social Sciences - Lab Session 6 - Profe...
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 6 - Profe...Daniel Katz
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
Anuj Gupta
 
CPP10 - Debugging
CPP10 - DebuggingCPP10 - Debugging
CPP10 - Debugging
Michael Heron
 
Algorithm-RepetitionSentinellNestedLoop_Solution.pptx
Algorithm-RepetitionSentinellNestedLoop_Solution.pptxAlgorithm-RepetitionSentinellNestedLoop_Solution.pptx
Algorithm-RepetitionSentinellNestedLoop_Solution.pptx
AliaaAqilah3
 
Case Study of the Unexplained
Case Study of the UnexplainedCase Study of the Unexplained
Case Study of the Unexplainedshannomc
 
Dutch PHP Conference 2013: Distilled
Dutch PHP Conference 2013: DistilledDutch PHP Conference 2013: Distilled
Dutch PHP Conference 2013: Distilled
Zumba Fitness - Technology Team
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
Kien Le
 
Performance tuning the Spring Pet Clinic sample application
Performance tuning the Spring Pet Clinic sample applicationPerformance tuning the Spring Pet Clinic sample application
Performance tuning the Spring Pet Clinic sample application
Julien Dubois
 
Automated Scaling of Microservice Stacks for JavaEE Applications
Automated Scaling of Microservice Stacks for JavaEE ApplicationsAutomated Scaling of Microservice Stacks for JavaEE Applications
Automated Scaling of Microservice Stacks for JavaEE Applications
Jelastic Multi-Cloud PaaS
 
How EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
How EVERFI Moved from No Automation to Continuous Test Generation in 9 MonthsHow EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
How EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
Applitools
 
Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Devel::NYTProf 2009-07 (OUTDATED, see 201008)Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Tim Bunce
 
Soft quality & standards
Soft quality & standardsSoft quality & standards
Soft quality & standards
Prince Bhanwra
 
Soft quality & standards
Soft quality & standardsSoft quality & standards
Soft quality & standards
Prince Bhanwra
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
Vidyasagar Bhargava
 
Agile Experiments in Machine Learning
Agile Experiments in Machine LearningAgile Experiments in Machine Learning
Agile Experiments in Machine Learning
mathias-brandewinder
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Introduction to c first week slides
Introduction to c first week slidesIntroduction to c first week slides
Introduction to c first week slides
luqman bawany
 
ch02-primitive-data-definite-loops.ppt
ch02-primitive-data-definite-loops.pptch02-primitive-data-definite-loops.ppt
ch02-primitive-data-definite-loops.ppt
Mahyuddin8
 

Similar to Mlcc #4 (20)

backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 6 - Profe...
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 6 - Profe...ICPSR - Complex Systems Models in the Social Sciences - Lab Session 6 - Profe...
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 6 - Profe...
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
CPP10 - Debugging
CPP10 - DebuggingCPP10 - Debugging
CPP10 - Debugging
 
Algorithm-RepetitionSentinellNestedLoop_Solution.pptx
Algorithm-RepetitionSentinellNestedLoop_Solution.pptxAlgorithm-RepetitionSentinellNestedLoop_Solution.pptx
Algorithm-RepetitionSentinellNestedLoop_Solution.pptx
 
Case Study of the Unexplained
Case Study of the UnexplainedCase Study of the Unexplained
Case Study of the Unexplained
 
Dutch PHP Conference 2013: Distilled
Dutch PHP Conference 2013: DistilledDutch PHP Conference 2013: Distilled
Dutch PHP Conference 2013: Distilled
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Performance tuning the Spring Pet Clinic sample application
Performance tuning the Spring Pet Clinic sample applicationPerformance tuning the Spring Pet Clinic sample application
Performance tuning the Spring Pet Clinic sample application
 
Automated Scaling of Microservice Stacks for JavaEE Applications
Automated Scaling of Microservice Stacks for JavaEE ApplicationsAutomated Scaling of Microservice Stacks for JavaEE Applications
Automated Scaling of Microservice Stacks for JavaEE Applications
 
How EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
How EVERFI Moved from No Automation to Continuous Test Generation in 9 MonthsHow EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
How EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
 
Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Devel::NYTProf 2009-07 (OUTDATED, see 201008)Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Devel::NYTProf 2009-07 (OUTDATED, see 201008)
 
Soft quality & standards
Soft quality & standardsSoft quality & standards
Soft quality & standards
 
Soft quality & standards
Soft quality & standardsSoft quality & standards
Soft quality & standards
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
 
Agile Experiments in Machine Learning
Agile Experiments in Machine LearningAgile Experiments in Machine Learning
Agile Experiments in Machine Learning
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Introduction to c first week slides
Introduction to c first week slidesIntroduction to c first week slides
Introduction to c first week slides
 
ch02-primitive-data-definite-loops.ppt
ch02-primitive-data-definite-loops.pptch02-primitive-data-definite-loops.ppt
ch02-primitive-data-definite-loops.ppt
 

More from Chung-Hsiang Ofa Hsueh

Secret weapons for startups
Secret weapons for startupsSecret weapons for startups
Secret weapons for startups
Chung-Hsiang Ofa Hsueh
 
Head first latex
Head first latexHead first latex
Head first latex
Chung-Hsiang Ofa Hsueh
 
MLCC #2
MLCC #2MLCC #2
2018.06.03.the hard thing about hard things for hpx-kh
2018.06.03.the hard thing about hard things for hpx-kh2018.06.03.the hard thing about hard things for hpx-kh
2018.06.03.the hard thing about hard things for hpx-kh
Chung-Hsiang Ofa Hsueh
 
YC Startup School 2016 Info sharing@inbetween international
YC Startup School 2016 Info sharing@inbetween internationalYC Startup School 2016 Info sharing@inbetween international
YC Startup School 2016 Info sharing@inbetween international
Chung-Hsiang Ofa Hsueh
 
2016.7.19 汽車駭客手冊
2016.7.19 汽車駭客手冊2016.7.19 汽車駭客手冊
2016.7.19 汽車駭客手冊
Chung-Hsiang Ofa Hsueh
 
2016.6.17 TEIL group meeting
2016.6.17 TEIL group meeting2016.6.17 TEIL group meeting
2016.6.17 TEIL group meeting
Chung-Hsiang Ofa Hsueh
 
Introduction of Silicon Valley Innovation Safari
Introduction of Silicon Valley Innovation SafariIntroduction of Silicon Valley Innovation Safari
Introduction of Silicon Valley Innovation Safari
Chung-Hsiang Ofa Hsueh
 
Ec x fintech
Ec x fintechEc x fintech
2016.3.22 從車庫的舊pc到百萬台伺服器
2016.3.22 從車庫的舊pc到百萬台伺服器2016.3.22 從車庫的舊pc到百萬台伺服器
2016.3.22 從車庫的舊pc到百萬台伺服器
Chung-Hsiang Ofa Hsueh
 
2015.6.29 以色列新創背包攻略本
2015.6.29 以色列新創背包攻略本2015.6.29 以色列新創背包攻略本
2015.6.29 以色列新創背包攻略本
Chung-Hsiang Ofa Hsueh
 
2015.11.21 Scrum:用一半的時間做兩倍的事
2015.11.21 Scrum:用一半的時間做兩倍的事2015.11.21 Scrum:用一半的時間做兩倍的事
2015.11.21 Scrum:用一半的時間做兩倍的事
Chung-Hsiang Ofa Hsueh
 
2015.10.31 淺談矽谷的fintech趨勢
2015.10.31 淺談矽谷的fintech趨勢2015.10.31 淺談矽谷的fintech趨勢
2015.10.31 淺談矽谷的fintech趨勢
Chung-Hsiang Ofa Hsueh
 
Pretotype it
Pretotype itPretotype it
2015.9.2 矽谷與以色列的祕密醬汁
2015.9.2 矽谷與以色列的祕密醬汁2015.9.2 矽谷與以色列的祕密醬汁
2015.9.2 矽谷與以色列的祕密醬汁
Chung-Hsiang Ofa Hsueh
 
2015.1.5 os.server.keyterms
2015.1.5 os.server.keyterms2015.1.5 os.server.keyterms
2015.1.5 os.server.keyterms
Chung-Hsiang Ofa Hsueh
 
2015.06.16 why silicon valley matters
2015.06.16 why silicon valley matters2015.06.16 why silicon valley matters
2015.06.16 why silicon valley matters
Chung-Hsiang Ofa Hsueh
 
2015.3.12 the root of lisp
2015.3.12 the root of lisp2015.3.12 the root of lisp
2015.3.12 the root of lisp
Chung-Hsiang Ofa Hsueh
 
2015.4.10 守護程序ii 自由之戰
2015.4.10 守護程序ii 自由之戰2015.4.10 守護程序ii 自由之戰
2015.4.10 守護程序ii 自由之戰
Chung-Hsiang Ofa Hsueh
 
2015.4.7 startup nation
2015.4.7 startup nation2015.4.7 startup nation
2015.4.7 startup nation
Chung-Hsiang Ofa Hsueh
 

More from Chung-Hsiang Ofa Hsueh (20)

Secret weapons for startups
Secret weapons for startupsSecret weapons for startups
Secret weapons for startups
 
Head first latex
Head first latexHead first latex
Head first latex
 
MLCC #2
MLCC #2MLCC #2
MLCC #2
 
2018.06.03.the hard thing about hard things for hpx-kh
2018.06.03.the hard thing about hard things for hpx-kh2018.06.03.the hard thing about hard things for hpx-kh
2018.06.03.the hard thing about hard things for hpx-kh
 
YC Startup School 2016 Info sharing@inbetween international
YC Startup School 2016 Info sharing@inbetween internationalYC Startup School 2016 Info sharing@inbetween international
YC Startup School 2016 Info sharing@inbetween international
 
2016.7.19 汽車駭客手冊
2016.7.19 汽車駭客手冊2016.7.19 汽車駭客手冊
2016.7.19 汽車駭客手冊
 
2016.6.17 TEIL group meeting
2016.6.17 TEIL group meeting2016.6.17 TEIL group meeting
2016.6.17 TEIL group meeting
 
Introduction of Silicon Valley Innovation Safari
Introduction of Silicon Valley Innovation SafariIntroduction of Silicon Valley Innovation Safari
Introduction of Silicon Valley Innovation Safari
 
Ec x fintech
Ec x fintechEc x fintech
Ec x fintech
 
2016.3.22 從車庫的舊pc到百萬台伺服器
2016.3.22 從車庫的舊pc到百萬台伺服器2016.3.22 從車庫的舊pc到百萬台伺服器
2016.3.22 從車庫的舊pc到百萬台伺服器
 
2015.6.29 以色列新創背包攻略本
2015.6.29 以色列新創背包攻略本2015.6.29 以色列新創背包攻略本
2015.6.29 以色列新創背包攻略本
 
2015.11.21 Scrum:用一半的時間做兩倍的事
2015.11.21 Scrum:用一半的時間做兩倍的事2015.11.21 Scrum:用一半的時間做兩倍的事
2015.11.21 Scrum:用一半的時間做兩倍的事
 
2015.10.31 淺談矽谷的fintech趨勢
2015.10.31 淺談矽谷的fintech趨勢2015.10.31 淺談矽谷的fintech趨勢
2015.10.31 淺談矽谷的fintech趨勢
 
Pretotype it
Pretotype itPretotype it
Pretotype it
 
2015.9.2 矽谷與以色列的祕密醬汁
2015.9.2 矽谷與以色列的祕密醬汁2015.9.2 矽谷與以色列的祕密醬汁
2015.9.2 矽谷與以色列的祕密醬汁
 
2015.1.5 os.server.keyterms
2015.1.5 os.server.keyterms2015.1.5 os.server.keyterms
2015.1.5 os.server.keyterms
 
2015.06.16 why silicon valley matters
2015.06.16 why silicon valley matters2015.06.16 why silicon valley matters
2015.06.16 why silicon valley matters
 
2015.3.12 the root of lisp
2015.3.12 the root of lisp2015.3.12 the root of lisp
2015.3.12 the root of lisp
 
2015.4.10 守護程序ii 自由之戰
2015.4.10 守護程序ii 自由之戰2015.4.10 守護程序ii 自由之戰
2015.4.10 守護程序ii 自由之戰
 
2015.4.7 startup nation
2015.4.7 startup nation2015.4.7 startup nation
2015.4.7 startup nation
 

Recently uploaded

社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

Mlcc #4

  • 1. MLCC #4 Neural Network Presented by Ofa 2018.7.18
  • 2. Agenda • Introduction to NN • Backpropogation • Training Neural Networks • Multi-Class NN • Embeddings • ML Engineering
  • 4. What is ANN? • First we may need to think about what is INTELLIGENCE?
  • 9. Non-linear Problems Linear solver + linear solver = linear solver!!!
  • 10. Non-linear Problems Real Case: TCM price Price from store(每錢) Price from origin(每⽄斤)
  • 11. ReLU non-linear function => nonlinear model
  • 12. Activation Functions More activation functions: https://www.tensorflow.org/api_guides/python/nn
  • 13. Playground - with 1 hidden layer node Linear activation Sigmoid activation ReLu activation
  • 14. Playground - with 2 hidden layer nodes Linear activation Sigmoid activation ReLu activation #sometimes shows another result
  • 15. Playground - challenge 0.177 loss First trial Remove empty nodes #L2 regularization is required
  • 16. Playground - initialization First trial Second trial #DIY
  • 17. Playground - Spiral First trial Second trial You can still only tuning the parameters to reach a good performance rather than doing feature engineering
  • 18. Playground - Spiral First trial Second trial It is a choice between more features and more computing power($$$)
  • 19. Programming Exercise OK.. it’s steps and batch size that matter…
  • 21. Backpropogation • How data flows through the graph. • How dynamic programming lets us avoid computing exponentially many paths through the graph.
  • 22. Backpropogation • Update weights according to the error • wij = wij - a*dE/dwij
  • 23. Backpropogation • Starting from the output layer! • d(1/2(youtput - y target)^2) = youtput - y target
  • 24. Backpropogation • Go backward for each node • dE/dwij = dxj/dwij *dE/dxj = yi *dE/dxj #cuzxj = yi *wij dE/dw46 = dx6/dw46 *dE/dx6 = y4 *dE/dx6
  • 25. –Trust me, it’s too complicated to understand “哩喜勒勒公三⼩小”
  • 26. Example Input Layer Hidden Layer Output Layer Bias = f(0.3825) = 0.5944 To compute sigmoid, you can use : https://goo.gl/Jiuw2p Try to compute yh2, o1, o2 by yourself Reference
  • 27. Example Input Layer Hidden Layer Output Layer Bias So we get: yh1= 0.5944 yh2 = 0.5968 o1 = f(1.106) = 0.7513 o2 = f(1.225) = 0.7729
  • 28. Example Input Layer Hidden Layer Output Layer Bias Then we can update weights: OutputO1 = 0.75 OutputO2 = 0.773 Etotal = EO1 + EO2 = 1/2*(0.01-0.75)^2 + 1/ 2*(.099 - 0.773)^2 = 0.74 w5new = w5old - a*dEtotal/dw5
  • 29. Example Then we can get: w5new = w5old - a*dEtotal/dw5 so, w5new = 0.4 - 0.5 * 0.082 = 0.359
  • 30. Example Then we can get: w5new = 0.359 w6new = 0.4 w7new = 0.51 w8new = 0.56 Next, we need to update the first layer, ie. w1~w4 Input Layer Hidden Layer Output Layer Bias
  • 31. Example Input Layer Hidden Layer Output Layer Bias #We’ve already computed in the previous layer Then we can also get : #
  • 32. Example Input Layer Hidden Layer Output Layer Bias
  • 33. Example Input Layer Hidden Layer Output Layer Bias Then we can get: w1new = 0.1497 w2new = 0.1995 w3new = 0.2497 w4new = 0.2995
  • 34. Brief Summary • You can do the update for all weights, just remember you need to update all the weight together instead of one-by-one. • That means you should always update the weights using old data, do not mix the old ones and new ones. • But, I think just call nn.train() would be the best way to do it!
  • 35. Training Neural Nets • Thing to note: • Gradients should be differentiable so we can learn from it. • Gradients can vanish and explode: additional layer, ReLUs / learning rate, batch normalization • Lower level gradient may go closer to zero that makes training slow. Use ReLU can prevent it. • If weights are too large, they may make lower level gradients explode. Use batch normalization to avoid it. • ReLU layers can die: learning rate
  • 36. Dropout Regularization • Randomly dropping out units in a network for a single gradient step. • Control from 0.0 to 1.0, 1.0 means drop out all nodes and then learn nothing! • This mechanism makes deeplearning useful in recent years
  • 37. Programming Exercises: Normalization This way is much simpler than the solution…
  • 38. Programming Exercises: Optimizer AdagradOptimizer: Automatically reduce the learning rate RMSE=122.29 / 124.10 AdamOptimizer: Adaptive Moment Estimation, computes adaptive learning rates for each parameter. RMSE= 67.67 / 67.48 Reference: http://ruder.io/optimizing-gradient-descent/
  • 39. Programming Exercises: Normalization+ You can pass the normalization into the function options, that makes it simpler z_score, RMSE: 71.54 / 70.39 binary_threshold(0.5), RMSE: 115.78 / 116.41 clip(0.1, 0.8), RMSE: 115.77 / 116.33 log_normalize??? (math error)
  • 41. Multi-Class NN One-class NN Multi-Class NN
  • 42. See Food The ‘See Food’ app from Silicon Valley really happened, and it was also a lie “Meal Snap”
  • 43. See Food • Multi-class, single label: this is a hotdog, octopus, or banana • => softmax(candidate sampling) • Multi-class, multi-label: this picture contains hotdog, cucumber, tomato, and onion • => regression for all
  • 47. Collaborative Filtering Step 1. Preprocessing: build a dict of all movies Step 2. Encode the user behavior into the sparse representation
  • 49. Embeddings • Embed the data into an d-dimension plane, which maps items to low- dimension real vectors • the dimensions could be determined by the empirical way hidden layers
  • 51. PCA Reference: [1] https://goo.gl/XetAUb (right) [2] https://goo.gl/HctuRj (left, including python examples)
  • 52. Word2Vec Ray Hsueh has given an awesome talk last week!!!
  • 54. Production ML Systems What we’ve learned so far…
  • 55. Static vs Dynamic Training • Static - Trained offline. For data do not change a lot overtime. • Pros: easy to build and test, batch training, test and iterate until good • Cons: required monitors inputs, easy to let it grow stale • Dynamic - Trained online. • Pros: continue to feed data, regularly sync out updated version. Use progressive validation rather than batch training & test. Adapt to changes. • Cons: needs monitoring, model rollback & data quarantine capabilities
  • 56. Static vs Dynamic Inference • Static - Inference offline. For data do not change a lot overtime. • Pros: much less computational cost • Cons: need all the data at hand, update latency could be very long • Dynamic - Inference online. • Pros: can predict the newest data • Cons: latency is higher, you need budget to solve that
  • 57. Data Dependencies • Feature and data change makes huge impact with the model • Unit test for data? • Reliability: what about the input data disappears? • Versioning: feature changes over time? • Necessity: how useful is the feature according to its computational cost? • Correlations: tied together or tease apart? • Feedback loops: could my input be impacted by my output?
  • 59. Cancer Prediction • Hospitals specialized with cancer treatment make the model overfitting • => label leakage, just like a cheat
  • 60. Real World Guidelines • Keep the very first model extremely simple • Focus on data pipeline correctness • Use a simple, observable metric for training & evaluation • Own and monitor your input features • Treat your model configuration as code: review it, check it in • Write down the results of all experiments, especially “failures"
  • 61. Good Bye! Machine Learning Practica Check out these real-world case studies of how Google uses machine learning in its products, with video and hands-on coding exercises: • Image Classification: See how Google developed the image classification model powering search in Google Photos, and then build your own image classifier. • More Machine Learning Practica coming soon! Other Machine Learning Resources • Deep Learning: Advanced machine learning course on neural networks, with extensive coverage of image and text models • Rules of ML: Best practices for machine learning engineering • TensorFlow.js: WebGL-accelerated, browser-based JavaScript library for training and deploying ML models