This document provides an overview of machine learning algorithms and techniques. It discusses classification and regression metrics, naive Bayesian classifiers, clustering methods like k-means, ensemble learning techniques like bagging and boosting, the expectation maximization algorithm, restricted Boltzmann machines, neural networks including convolutional and recurrent neural networks, and word embedding techniques like Word2Vec, GloVe, and matrix factorization. Key algorithms and their applications are summarized at a high level.
Machine learning in science and industry — day 4arogozhnikov
- tabular data approach to machine learning and when it didn't work
- convolutional neural networks and their application
- deep learning: history and today
- generative adversarial networks
- finding optimal hyperparameters
- joint embeddings
Machine learning in science and industry — day 1arogozhnikov
A course of machine learning in science and industry.
- notions and applications
- nearest neighbours: search and machine learning algorithms
- roc curve
- optimal classification and regression
- density estimation
- Gaussian mixtures and EM algorithm
- clustering, an example of clustering in the opera
Machine learning in science and industry — day 2arogozhnikov
- decision trees
- random forest
- Boosting: adaboost
- reweighting with boosting
- gradient boosting
- learning to rank with gradient boosting
- multiclass classification
- trigger in LHCb
- boosting to uniformity and flatness loss
- particle identification
Robot의 Gait optimization, Gesture Recognition, Optimal Control, Hyper parameter optimization, 신약 신소재 개발을 위한 optimal data sampling strategy등과 같은 ML분야에서 약방의 감초 같은 존재인 GP이지만 이해가 쉽지 않은 GP의 기본적인 이론 및 matlab code 소개
Slides were formed by referring to the text Machine Learning by Tom M Mitchelle (Mc Graw Hill, Indian Edition) and by referring to Video tutorials on NPTEL
Machine learning in science and industry — day 4arogozhnikov
- tabular data approach to machine learning and when it didn't work
- convolutional neural networks and their application
- deep learning: history and today
- generative adversarial networks
- finding optimal hyperparameters
- joint embeddings
Machine learning in science and industry — day 1arogozhnikov
A course of machine learning in science and industry.
- notions and applications
- nearest neighbours: search and machine learning algorithms
- roc curve
- optimal classification and regression
- density estimation
- Gaussian mixtures and EM algorithm
- clustering, an example of clustering in the opera
Machine learning in science and industry — day 2arogozhnikov
- decision trees
- random forest
- Boosting: adaboost
- reweighting with boosting
- gradient boosting
- learning to rank with gradient boosting
- multiclass classification
- trigger in LHCb
- boosting to uniformity and flatness loss
- particle identification
Robot의 Gait optimization, Gesture Recognition, Optimal Control, Hyper parameter optimization, 신약 신소재 개발을 위한 optimal data sampling strategy등과 같은 ML분야에서 약방의 감초 같은 존재인 GP이지만 이해가 쉽지 않은 GP의 기본적인 이론 및 matlab code 소개
Slides were formed by referring to the text Machine Learning by Tom M Mitchelle (Mc Graw Hill, Indian Edition) and by referring to Video tutorials on NPTEL
Introduction to machine learning terminology.
Applications within High Energy Physics and outside HEP.
* Basic problems: classification and regression.
* Nearest neighbours approach and spacial indices
* Overfitting (intro)
* Curse of dimensionality
* ROC curve, ROC AUC
* Bayes optimal classifier
* Density estimation: KDE and histograms
* Parametric density estimation
* Mixtures for density estimation and EM algorithm
* Generative approach vs discriminative approach
* Linear decision rule, intro to logistic regression
* Linear regression
Data Science - Part IX - Support Vector MachineDerek Kane
This lecture provides an overview of Support Vector Machines in a more relatable and accessible manner. We will go through some methods of calibration and diagnostics of SVM and then apply the technique to accurately detect breast cancer within a dataset.
A pre conference workshop on Machine Learning was organized as a part of #doppa17, DevOps++ Global Summit 2017. The workshop was conducted by Dr. Vivek Vijay and Dr. Sandeep Yadav. All the copyrights are reserved with the author.
Presentation of my NSERC-USRA funded summer research project given at the Canadian Undergraduate Mathematics Conference (CUMC) 2014.
Please refer to the project site: http://jessebett.com/Radial-Basis-Function-USRA/
Welcome to the Supervised Machine Learning and Data Sciences.
Algorithms for building models. Support Vector Machines.
Classification algorithm explanation and code in Python ( SVM ) .
Export promotion Council for Handicrafts invites the whole world on the occasion of "HOME EXPO INDIA FAIR 2017" which is going to held at the Greater Noida on the date 18 to 20 April 2017
Introduction to machine learning terminology.
Applications within High Energy Physics and outside HEP.
* Basic problems: classification and regression.
* Nearest neighbours approach and spacial indices
* Overfitting (intro)
* Curse of dimensionality
* ROC curve, ROC AUC
* Bayes optimal classifier
* Density estimation: KDE and histograms
* Parametric density estimation
* Mixtures for density estimation and EM algorithm
* Generative approach vs discriminative approach
* Linear decision rule, intro to logistic regression
* Linear regression
Data Science - Part IX - Support Vector MachineDerek Kane
This lecture provides an overview of Support Vector Machines in a more relatable and accessible manner. We will go through some methods of calibration and diagnostics of SVM and then apply the technique to accurately detect breast cancer within a dataset.
A pre conference workshop on Machine Learning was organized as a part of #doppa17, DevOps++ Global Summit 2017. The workshop was conducted by Dr. Vivek Vijay and Dr. Sandeep Yadav. All the copyrights are reserved with the author.
Presentation of my NSERC-USRA funded summer research project given at the Canadian Undergraduate Mathematics Conference (CUMC) 2014.
Please refer to the project site: http://jessebett.com/Radial-Basis-Function-USRA/
Welcome to the Supervised Machine Learning and Data Sciences.
Algorithms for building models. Support Vector Machines.
Classification algorithm explanation and code in Python ( SVM ) .
Export promotion Council for Handicrafts invites the whole world on the occasion of "HOME EXPO INDIA FAIR 2017" which is going to held at the Greater Noida on the date 18 to 20 April 2017
“It is not stress that kills us but rather our response to it” – Hans Selye. Our stress management tips given in this presentation gives you the ability to alter your response to the event. Stress in the workplace can be handled with greater ease when you use effective stress management techniques.
The Coming Intelligent Digital Assistant Era and Its Impact on Online PlatformsCognizant
The coming proliferation of intelligent digital assistants (IDAs), when IDAs will represent their human owners, is a key step in the emergence of an autonomous business environment. To accommodate such rapid changes, online platform providers must upgrade their capabilities and business models to better contend with factors such as AI, scalable infrastructure, anayltics, API-based development, and advances in product search and discovery.
Chaque mois, nous cherchons à vous éclairer sur une nouvelle façon d’aborder certaines idées, informations ou théories. Ce mois ci, éclairage sur les évolutions adaptatives.
"Each shipwreck has a story," wrote the late Jacques Cousteau. The waters of the Caribbean region, from Bermuda to Barbados, have been swallowing ships for more than 400 years. That's a lot of stories.
SI-PI, Khristina Damayanti, Hapzi Ali, Isu Sosial Dan Etika Dalam Sistem Info...khristina damayanti
Di perusahaan system informasi paling luas cakupannya yaitu di Marketing. Karena di marketing, perusahaan bersinggungan langsung dengan masyarakat. Terlebih saat ini social media hampir digunakan oleh setiap orang.
Biasanya perusahaan menggunakan social media untuk membangun komunitas dan jaringan marketing. Akan tetapi etika di social media ini kadang berbenturan. Misalkan kita ambil contoh penggunaan foto yang di posting oleh orang di social media dan diambil oleh divisi design dept marketing. Divisi design tersebut biasanya mencari foto dari google dan menggunakannya sebagai media promosi di social media dengan mengedit terlebih dahulu foto tersebut. Hal ini terjadi karena posting di social media harus rutin dan berkala, sehingga designer dituntut menghasilkan foto design yang cukup banyak. Posting di social media harus rutin agar komunitas selalu menerima informasi terbaru dari perusahaan. Dengan rutin nya posting ini maka anggaran jika membeli lisensi dari foto tersebut menjadi tinggi. Sehingga biasanya designer hanya mengambil saja tanpa melihat lisensi dari foto tersebut.
Anomaly detection using deep one class classifier홍배 김
- Anomaly detection의 다양한 방법을 소개하고
- Support Vector Data Description (SVDD)를 이용하여
cluster의 모델링을 쉽게 하도록 cluster의 형상을 단순화하고
boundary근방의 애매한 point를 처리하는 방법 소개
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
3. Discriminative & Generative Model
Decision Function Y=f(X) or conditional probability P(Y|X)
Generative Model: learn P(X,Y), calculate posterior prob dist P(Y|X):
HMM, Bayesian Classifier
Discriminative Model: learn P(Y|X) directly
DT, NN,SVM,Boosting,CRF
4. Metrics - Classification
Accuracy
Precision & Recall (generally binary classification)
TP(Pos->Pos) FP(Neg->Pos)
FN(Pos->Neg) TN(Neg->Neg)
F1 score
P & R are high, F1 is high.
5. Metrics - Regression
Mean Absolute Errors & M. Square E.
R-squared error (coefficient of determination):
A higher value means a good model.
6. Dataset ML Strategy
From Andrew Ng, NIPS, 2016
Bias-variance Trade-off =
Avoidable Bias + Variance
More Data; Regularization; New Model
7. Naive Bayesian Classifier
Two Assumptions
Bayes’ Theorem
Features are conditionally independent
How to do?
Learn Joint probability distribution p(x,y)
Learn p(x)
Calculate p(y)
9. Naïve Bayesian: Attribute Conditional independent assumption
Each input x is a vector with n elements.
Given the class y, features are conditionally
independent.
Y
x1 x2
12. Naïve Bayesian: Definition
Want the class that has the max of the probabilities:
For everyone, the denominators are the same, so we can simplify it:
13. Naïve Bayesian: Parameter Estimation
Maximum Likelihood Estimation
Goal: estimate priori P(Y=c) for each class
estimate P(X=x | Y=c) for each class
Count(x,y)
--------------- = P(x|y)
Count(y)
14. Naïve Bayesian: Cost
Choose 0-1 cost function:
Right answer = 0
Wrong answer = 1
-- L gives “how big” it is making the mistake...we want to minimize it!
Add up those we go wrong
1- Right cases
Minimize the errors ⇔ Maximize the right
16. Naïve Bayesian: Bayesian Estimation
When do MLE, could be 0 counts. Then you times it...all become 0.
Add one more thing, lambda >= 0. If lambda = 0, it is MLE; If lambda = 1, it is
Laplace smoothing.
Laplace smoothing, it is a probalistic distribution.
Priori Prob’s Bayesian Estimation:
17. Bayesian: More Models
Naïve Bayesian: attribute Conditional independent assumption
Semi-Naive Bayesain: One-Dependent Estimator
Y
x1 x2 xn
Naive Bayesian
Y
x1 x2 xn
Super-Parent One-Dependent
Estimator(SPODE)
x3
18. Takeaway (1): Probabilistic Graphical Models
Y
x1 x2 xn
Naive Bayesian
P(Y)
P(xn|Y)
P(y)P(x1|y)P(x2|y)..P(xn|y)
Joint Prob! What we want!
P(x1|Y)
Find out More: https://www.coursera.org/learn/probabilistic-graphical-models (Prof.Daphne Koller)
19. Takeaway (2): Bayesian Network
A
C D E
P(A)P(B)P(C|A)P(D|A,B)P(E|B)
Joint Prob! What we want!
Named also Belief Network
DAG: Directed Acyclic Graph
CPT: Conditional Probability Table
B
P(A) P(B)
P(C|A)
P(E|B)P(D|A,B)
Given P(A),
C and D,
independent:
C⊥D|A
20. Clustering: Unsupervised Learning
Similarty Methods: Eculidean Distance, etc.
In-group: high similarity
Out-group: high distance
Methods
Prototype-based Clustering: K-means, LVQ, MoG
Density-based Clustering: DBSCAN
Hierarchical Clustering: AGNES,DIANA
22. K-means: Algorithm
(Input: whole dataset points x1,...xm)
Initialization: Randomly place centroids c1..ck
Repeat until convergence (stop when no points changing):
- for each data point xi:
find nearest centroid, set the point to the centroid cluster
- for each cluster:
calculate new centroid (means of all new points)
https://www.youtube.com/watch?v=_aWzGGNrcic
argmin D (xi,cj) for all cj
O(#iter * #clusters * #instances *
#dimensions)
23. K-means: A Quick Demo
Two clusters, squares are the centroids.
Calculate new centroids,
finish one iteration.
25. Clustering: Other Methods(1)
Prototype-based Clustering
LVQ: Find out a group of vectors to describe the clusters, with labels
MoG: describe the model by a probabilistic model.
Density-based Clustering
DBSCAN
27. Ensemble Learning
Strong learning method: learns model with good performance;
Week learning method: learns model slightly better than random.
Ensemble learning methods:
Iterative method: Boosting
Parallel method: Bagging
28. Boosting: Algorithm
Boosting: gather/ensemble week learners to become a strong one.
Through majority voting on classification; average on regression.
Algo:
1) Feed N data to train a week learner/ model: h1.
2) Feed N data to train another one: h2; N pieces of data including: h1 errors and new
data never trained before.
3) Repeat to train hn, N includes previous errors and new data.
4) Get final model: h_final=Majority Vote(h1,h2,..,hn)
29. Bagging: Bootstrap AGGregatING
Bootstrap sampling: take k/n with replacement at each time. Samples are
overlapped. Appro. 36.8% would not be selected. If we sample m times:
Training: in parallelization.
Ensemble: voting on classification; average on regression.
Random Forest: a variant of Bagging.
30. Random Forest: Random Attribute Selection
Bagging: select a number of data to train each DT.
Random Attribute Selection
Each tree, each node, select k attributes, then find out the optimum.
Emperical k:
where d is the total number of attributes.
31. Expectation Maximization: Motivation
Expectation Maximization:
- An iterative method to maximum the likelihood function.
- Always used for estimate parameters in statistical models.
Example:
Three coins: A, B and C. For each throw step, first throw coin A. If positive,
throw B, otherwise, throw C. Repeat those steps n times independently.
The face up probability of A, B and C is 𝝅, 𝘱 and 𝘲.
If we only can observe the results not the process, how can we get the
face up probability of those coins?
32. Expectation Maximization
Suppose we are in step i, we have once observed results y (1 or 0). we did not
know next throw results z. 𝛳 = ( 𝝅, 𝘱, 𝘲) the parameters of the model. Then the
whole probability of next throw will be:
according to Bayesian:
33. Expectation Maximization
Extend to whole steps, if overall observed results follows
Non-observed values refer as
The face up probability among whole steps will be:
Now, using EM algorithm to estimate the parameters.
38. Restricted Boltzmann Machine
Basic concepts:
h: status vector of hidden units (0/1)
v: status vector of visible units (0/1)
b: bias vector of hiddens units
a: bias vector of visible units
W: weight vector between each
hidden units and visible units
39. Fundamental theory of RBM:
energe between visible units v and hidden units h:
probability distribution through energy function:
z is the partition function in physic, which is not important part in RBM
Restricted Boltzmann Machine
40. Suppose activate probability of hidden units
Visible units
Input(x) ⇒ v:
- calculate activate probability of every hidden units by v, p(h1, v1)
- using gibbs sampling selecting a sample to represents whole hidden layer: h1 ~ p(h1, v1)
- calculate activate probability of every visible units by h1, p(v2, h1)
- using gibbs sampling selecting a sample to represents whole visible layer: v2 ~ p(v2, h1)
- calculate activate probability of every hidden units by v2, p(h2, v2)
- then update:
RBM: Contrastive Divergence (train)
41. Deductive active probability (take hidden units as example)
- For a hidden unit k:
- then, introduce two equation below:
- It is obviously to see:
- where, it represents the energy parts of j quals to k and not equals to k
42. Deductive active probability
According to contrastive divergence, when calculate a hidden units, visible units already know.
Meanwhile, other hidden units also should be known.
- First, using Bayes function
- Because other hidden units status in 0 or 1:
- cooperate with the probability distribution function
44. Back Propagation: Main Purpose
The main purpose of back propagation is to find the partial derivatives ∂C/∂
w means all parameters in the network including weights and biases.
Cost function:
where n is the sample number and a(x) is the output of the network
Note:
It is useful to first point out the naive forward propagation algorithm implied by the
chain rule. Then we can find out the advantages of back propagation by simply
comparing 2 algorithms.
45. Naive Forward Propagation
Naive forward propagation algorithm using chain rule to compute the partial
derivatives on every node of the network in a forward manner.
How it works:
1.Compute ∂ui/∂uj for every pair of nodes where ui
is at a higher level than uj.
2.In order to compute , , we need to compute
for every input of ,. Thus the total work
in the algorithm is O(VE).
( v = node number, e = edge number)
46. Back Propagation -Calculate Error
If we try to change the value of
(z is the input of the node), it will affect the
result of the next layer and finally affect the
output.
Assume is close to 0, then change the
value of z will not help us to minimize the cost C, in this case we can say that
node is close to optimum.
Naturally we can define error as:
47. Back Propagation -Calculate Error
Using chain rule, we can deduce
By replacing the parital derivative with vector:
∇aC is a vector with elements equal to ∂C/∂aLj
⊙ is hadamard operater
48. Back Propagation -Back Propagation
From layer l+1 to layer l, trying to use to represent
(1)
where
(2)
By combine (1) and (2):
49. Back Propagation -From error to parameters
After calculating the error, we finally need one more step:
Using error to calculate the derivatives on parameters(weights and biases)
Given the equation:
For bias b:
For weight w:
51. Convolutional Neural Network -Pooling Layer
The main idea of “Max Pooling Layer” is to capture the most important activation
(maximum overtime):
e.g.
Q: This operation shrinks the feature number(from n-h+1 to 1), how can we get more features?
A: Applying multiple filters with different window sizes and different weights.
52. Convolutional Neural Network -Multi-channel
Start with 2 copies of the word vectors, then pre-train one of them(word2vec
or Glove), this action change the value of vectors of it. Keep the other one
static.
Apply the same filter on them, then sum the Ci of them before max pooling
53. Convolutional Neural Network -Dropout
Create a masking vector r with random variables 0 or 1. To delete some of the
features to prevent co-adaption(overfitting)
Kim (2014) reports 2 – 4% improved accuracy and ability to use very large networks without overfitting
54. Word Vectors- Taxonomy
Idea: Use a big “graph” to define relationships between words, it has
tree structure to declare the relationship between words.
Famous example: WordNet
Disadvantages:
Difficult to maintain(when new word comes in).
Require human labour.
Hard to compute word similarity.
56. Word Vectors-Window Based Coocurrence-matrix
Problems:
Space consuming.
Difficult to update.
Solution:
Reduce dimension.(Singular
Value Decomposition)
57. Word Vectors-Word2vec
Word2vec: Predicts neighbor words.
Previous approach: capturing co-occurrence statistics.
Advantage:
Faster and can easily incorporate a new sentence/ document or add a
word to the vocabulary.
Good representation. (Solve analogy by vector subtraction)
59. Word Vectors-Word2vec
Problem:
With large vocabularies this objective function is not scalable and
would train too slow.(use glove instead)
Solution:
Negative sampling.
62. Word Vectors-Continuous Bag of Words
Differ from word2vec, this model trying to predict the center word from the
surrounding words.
The result will be slightly different from skip-gram model, by average them we
can get a better result.
64. Word Vectors-Glove
Collect the co-occurrence statistics information from the whole corpus instead
of going over one window at a time.
Then optimize the vectors using the following equation.
Where Pij is the coocurrence counts from the matrix.
Fast training(do not have to go over every window), Scalable to huge corpora,
Good performance with small corpus.
65. Word Vectors-Glove Overview
Word-word Co-occurance Matrix, X : V * V (V is the vocab size)
Xij: the number of times word j occurs in the context of word i.
Xi = t : the number of times any word appears in the
context of word i.
Pi j = P(j|i) =Xi j/Xi : probability that word j appear in the context of word i.
P(solid | ice) = word “solid” appear in the context of word “ice”.
GloVe: Global Vectors for Word Representation
72. Recurrent Neural Network-Loss Function
The cross-entropy in one single time step:
Overall cross-entrophy cost:
Where V is the vocabulary and T means the text.
Exp:
yt = [0.3,0.6,0.1] the, a, movie
y1 = [0.001,0.009,0.9]
y2 = [0.001,0.299,0.7]
y3 = [0.001,0.9,0.009]
J1= yt*logJ1
J2= yt*logJ2
J3 = yt*logJ3
J3>J2>J1 because y3 is more close to
yt than others.
78. SVD: Singular Value Decomposition
Starts from Matrix Multiplication: Y = A*B
Check codes and plots here.
Want to find the direction & extend:
79. SVD: Singular Value Decomposition
Eigen-Value & Eigen-Vector of A:
(Have more than one pairs. )
Import numpy to calculate. =>
But only for square matrix!
80. SVD: Singular Value Decomposition
If A has n e-values:
Then:
So, AQ=Q x Sigma
Q
Sigma
Q
81. SVD: Singular Value Decomposition
For the non-square matrix? A: m * n.
Similar idea:
But how to get e-values and e-vectors?
82. SVD: Singular Value Decomposition
Find a square-matrix!
Calculate… then:
let r << # of e-value to represent A
O(n^3)
83. SVD: Application
Copression with loss!
Reduce parameter size!
if m = 1,000 n = 1,000
let r = 10
10^6 reduced to 20100 parameters
84. MF: Matrix Factorization in RecSys
Different from SVD. In MF, we let Y = U V , divide into two matrices. We will
walk through with recommender system.
http://www.dataperspective.info/2014/05/basic-recommendation-engine-using-r.html
Movies
Ratings
Us
ers
85. MF: Matrix Factorization in RecSys
Rating (user i, movie j) = User i Vector · Movie j Vector
So how to find the User Matrix and Movie Matrix?
Movies
RatingsUsers
= User Matrix
Movie
Matrix
user i
movie j
Rij
86. MF: Matrix Factorization in RecSys
Predict rating of user i, movie j:
Loss function (error and L2 regularization):
So we want to minimize L and get U and M, using SGD:
argmin L
Once U and M are computed, we can predict any ratings!
89. Useful Links
Stanford CS224d: Deep Learning for Natural Language Processing
Stanford CS231n: Convolutional Neural Networks for Visual Recognition
SVM Notes :http://www.robots.ox.ac.uk/~az/lectures/ml/lect2.pdf
Nuts and Bolts of Applying Deep Learning (Andrew Ng) - YouTube
LVQ: learning vector quantizaion 学习向量量化
MoG: mixture of Gaussian 高斯混合聚类
DBSCAN: Density-based spatial clustering of applications with noise
AGNES: AGglomerative NESting
DIANA: Divisive Analysis
LVQ: learning vector quantizaion 学习向量量化
MoG: mixture of Gaussian 高斯混合聚类
DBSCAN: Density-based spatial clustering of applications with noise
AGNES: AGglomerative NESting
DIANA: Divisive Analysis
对比散度
x = [
[0 1 0 0 0]
[0 0 0 0 0]
[0 0 1 0 0]
]
after loop 100:
a, b, W
query(u1, i2):
=>a2
Basic assumption: If u is a node at level t+1 and z is any node at level ≤t whose output is an input to u, then computing ∂u/∂z takes unit time on our computer.