This document discusses clustering methods using the EM algorithm. It begins with an overview of machine learning and unsupervised learning. It then describes clustering, k-means clustering, and how k-means can be formulated as an optimization of a biconvex objective function solved via an iterative EM algorithm. The document goes on to describe mixture models and how the EM algorithm can be used to estimate the parameters of a Gaussian mixture model (GMM) via maximum likelihood.
These slides presents the optimization using evolutionary computing techniques. Particle Swarm Optimization and Genetic Algorithm are discussed in detail. Apart from that multi-objective optimization are also discussed in detail.
These slides presents the optimization using evolutionary computing techniques. Particle Swarm Optimization and Genetic Algorithm are discussed in detail. Apart from that multi-objective optimization are also discussed in detail.
It presents various approximation schemes including absolute approximation, epsilon approximation and also presents some polynomial time approximation schemes. It also presents some probabilistically good algorithms.
An overview of gradient descent optimization algorithms Hakky St
勾配降下法についての論文をスライドにしたものです。
This is the slide for study meeting of gradient descent.
I use this paper and this is very good information about gradient descent.
https://arxiv.org/abs/1609.04747
This presentation covers Decision Tree as a supervised machine learning technique, talking about Information Gain method and Gini Index method with their related Algorithms.
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
Course "Machine Learning and Data Mining" for the degree of Computer Engineering at the Politecnico di Milano. In this lecture we introduce classifiers ensembles.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Gradient descent optimization with simple examples. covers sgd, mini-batch, momentum, adagrad, rmsprop and adam.
Made for people with little knowledge of neural network.
P, NP, NP-Complete, and NP-Hard
Reductionism in Algorithms
NP-Completeness and Cooks Theorem
NP-Complete and NP-Hard Problems
Travelling Salesman Problem (TSP)
Travelling Salesman Problem (TSP) - Approximation Algorithms
PRIMES is in P - (A hope for NP problems in P)
Millennium Problems
Conclusions
It presents various approximation schemes including absolute approximation, epsilon approximation and also presents some polynomial time approximation schemes. It also presents some probabilistically good algorithms.
An overview of gradient descent optimization algorithms Hakky St
勾配降下法についての論文をスライドにしたものです。
This is the slide for study meeting of gradient descent.
I use this paper and this is very good information about gradient descent.
https://arxiv.org/abs/1609.04747
This presentation covers Decision Tree as a supervised machine learning technique, talking about Information Gain method and Gini Index method with their related Algorithms.
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
Course "Machine Learning and Data Mining" for the degree of Computer Engineering at the Politecnico di Milano. In this lecture we introduce classifiers ensembles.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Gradient descent optimization with simple examples. covers sgd, mini-batch, momentum, adagrad, rmsprop and adam.
Made for people with little knowledge of neural network.
P, NP, NP-Complete, and NP-Hard
Reductionism in Algorithms
NP-Completeness and Cooks Theorem
NP-Complete and NP-Hard Problems
Travelling Salesman Problem (TSP)
Travelling Salesman Problem (TSP) - Approximation Algorithms
PRIMES is in P - (A hope for NP problems in P)
Millennium Problems
Conclusions
An immersive workshop at General Assembly, SF. I typically teach this workshop at General Assembly, San Francisco. To see a list of my upcoming classes, visit https://generalassemb.ly/instructors/seth-familian/4813
I also teach this workshop as a private lunch-and-learn or half-day immersive session for corporate clients. To learn more about pricing and availability, please contact me at http://familian1.com
Introduction to Statistical Clustering
Specifically K-Means and Gaussian Mixture Models (GMM). We also look at how Expectation Maximization (EM) can be used to fit GMMs.
Expectation Maximization (EM) algorithm is a method that is used for finding maximum likelihood or maximum a posteriori (MAP) that is the estimation of parameters in statistical models, and the model depends on unobserved latent variables that is calculated using models. Copy the link given below and paste it in new browser window to get more information on Em Algorithm:- http://www.transtutors.com/homework-help/statistics/em-algorithm.aspx
Expectation Maximization (EM) are set of rules in statistics, which are an iterative approach for obtaining maximum a posteriori (MAP) or maximum likelihood approx. of considerations for statistical models, as this model are based on unnoticeable hidden variables. Copy the link given below and paste it in new browser window to get more information on Expectation Maximization:- http://www.transtutors.com/homework-help/statistics/expectation-maximization.aspx
패션, 뷰티, 라이프스타일 부문 글로벌 디지털 콘텐츠 허브 : 패션인코리아 소개Evan Ryu
패션, 뷰티, 라이프스타일 부문의 국내, 해외 디지털 미디어 콘텐츠 프로젝트 패션인코리아 입니다. 국내 문화산업의 디지털, 온라인, SNS를 통한 글로벌 홍보 및 오프라인 비즈니스와 연계된 융합 마케팅 실무 수행, 컨설팅 등을 추진하고 있습니다. 패션, 문화, 라이프스타일 산업 부문의 디지털 마케터 양성 전문교육도 추진하고 있습니다.
k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
A presentation about NGBoost (Natural Gradient Boosting) which I presented in the Information Theory and Probabilistic Programming course at the University of Oklahoma.
Paper Study: Melding the data decision pipelineChenYiHuang5
Melding the data decision pipeline: Decision-Focused Learning for Combinatorial Optimization from AAAI2019.
Derive the math equation from myself and match the same result as two mentioned CMU papers [Donti et. al. 2017, Amos et. al. 2017] while applying the same derivation procedure.
We propose an efficient algorithmic framework for time domain circuit simulation using exponential integrators. This work addresses several critical issues exposed by previous matrix exponential based circuit simulation research, and makes it capable of simulating stiff nonlinear circuit system at a large scale. In this framework, the system’s nonlinearity is treated with exponential Rosenbrock-Euler formulation. The matrix exponential and vector product is computed using invert Krylov subspace method. Our proposed method has several distinguished advantages over conventional formulations (e.g., the well-known backward Euler with Newton-Raphson method). The matrix factorization is performed only for the conductance/resistance matrix G, without being performed for the combinations of the capacitance/inductance matrix C and matrix G, which are used in traditional implicit formulations. Furthermore, due to the explicit nature of our formulation, we do not need to repeat LU decompositions when adjusting the length of time steps for error controls. Our algorithm is better suited to solving tightly coupled post-layout circuits in the pursuit for full-chip simulation. Our experimental results validate the advantages of our framework.
This report is based on my final report of the course CommE 5051: Mathematical Principles of Machine Learning, National Taiwan University, 2018 spring. In this report, some theoretical principles of domain adaptation established in the literature are briefly presented.
Artificial Intelligence Course: Linear models ananth
In this presentation we present the linear models: Regression and Classification. We illustrate with several examples. Concepts such as underfitting (Bias) and overfitting (Variance) are presented. Linear models can be used as stand alone classifiers for simple cases and they are essential building blocks as a part of larger deep learning networks
محاضرة ألقيت بتنظيم من مجموعة برمج @parmg_sa
https://www.meetup.com/parmg_sa/events/238339639/
في الرياض، مقر حاضنة بادر. بتاريخ 20 جمادى الآخر 1438هـ، الموافق 18 مارس 2017
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Accelerate your Kubernetes clusters with Varnish Caching
K-means and GMM
1. Network Intelligence and Analysis Lab
Network Intelligence and Analysis Lab
Clustering methods via EM algorithm
2014.07.10
SanghyukChun
2. Network Intelligence and Analysis Lab
•
Machine Learning
•
Training data
•
Learning model
•
Unsupervised Learning
•
Training data without label
•
Input data: 퐷퐷={푥푥1,푥푥2,…,푥푥푁푁}
•
Most of unsupervised learning problems are trying to find hidden structure in unlabeled data
•
Examples: Clustering, Dimensionality Reduction (PCA, LDA), …
Machine Learning and Unsupervised Learning
2
3. Network Intelligence and Analysis Lab
•
Clustering
•
Grouping objects in a such way that objects in the same group are more similar to each other than other groups
•
Input: a set of objects (or data) without group information
•
Output: cluster index for each object
•
Usage: Customer Segmentation, Image Segmentation…
Unsupervised Learning and Clustering
Input
Output
Clustering
Algorithm
3
5. Network Intelligence and Analysis Lab
•
Intuition: data in same cluster has shorter distance than data which are in other clusters
•
Goal: minimize distance between data in same cluster
•
Objective function:
•
퐽퐽= 푛푛=1 푁푁 푘푘=1 퐾퐾 푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2
•
Where N is number of data points, K is number of clusters
•
푟푟푛푛푛∈{0,1}is indicator variables where k describing which of the K clusters the data point 퐱퐱퐧퐧is assigned to
•
훍훍퐤퐤is a prototype associated with the k-thcluster
•
Eventually 훍훍퐤퐤is same as the center (mean) of cluster
K-means Clustering
5
6. Network Intelligence and Analysis Lab
•
Objective function:
•
푎푎푎푎푎푎푎푎푎푛푛{푟푟푛푛푛푛,훍훍퐤퐤} 푛푛=1 푁푁 푘푘=1 퐾퐾 푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2
•
This function can be solved through an iterative procedure
•
Step 1: minimize J with respect to the 푟푟푛푛푛, keeping 훍훍퐤퐤is fixed
•
Step 2: minimize J with respect to the 훍훍퐤퐤, keeping 푟푟푛푛푛is fixed
•
Repeat Step 1,2 until converge
•
Does it always converge?
K-means Clustering –Optimization
6
7. Network Intelligence and Analysis Lab
•
Biconvex optimization is a generalization of convex optimization where the objective function and the constraint set can be biconvex
•
푓푓푥푥,푦푦is biconvex if fixing x, 푓푓푥푥y=푓푓푥푥,푦푦is convex over Y and fixing y, 푓푓푦푦푥푥=푓푓푥푥,푦푦is convex over X
•
One way to solve biconvex optimization problem is that iteratively solve the corresponding convex problems
•
It does not guarantee the global optimal point
•
But it always converge to some local optimum
Optional –Biconvex optimization
7
8. Network Intelligence and Analysis Lab
•
푎푎푎푎푎푎푎푎푎푛푛{푟푟푛푛푛푛,훍훍퐤퐤} 푛푛=1 푁푁 푘푘=1 퐾퐾 푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2
•
Step 1: minimize J with respect to the 푟푟푛푛푛, keeping 훍훍퐤퐤is fixed
•
푟푟푛푛푛=ቊ1푖푘푘=푎푎푎푎푎푎푎푎푎푛푛푗푗퐱퐱퐧퐧−훍훍퐤퐤 ퟐퟐ 0표표표표표표표표표표표표표표표
•
Step 2: minimize J with respect to the 훍훍퐤퐤, keeping 푟푟푛푛푛is fixed
•
Derivative with respect to 훍훍퐤퐤to zero giving
•
2Σ푛푛푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤=0
•
훍훍퐤퐤=Σ푛푛푟푟푛푛푛푛퐱퐱퐧퐧 Σ푛푛푟푟푛푛푛푛
•
훍훍퐤퐤is equal to the mean of all the data assigned to cluster k
K-means Clustering –Optimization
8
9. Network Intelligence and Analysis Lab
•
Advantage of K-means clustering
•
Easy to implement (kmeansin Matlab, kclusterin Python)
•
In practice, it works well
•
Disadvantage of K-means clustering
•
It can converge to local optimum
•
Computing Euclidian distance of every point is expensive
•
Solution: Batch K-means
•
Euclidian distance is non-robust to outlier
•
Solution: K-medoidsalgorithms (use different metric)
K-means Clustering –Conclusion
9
10. Network Intelligence and Analysis Lab
Mixture of Gaussians
Mixture Model
EM Algorithm
EM for Gaussian Mixtures
10
11. Network Intelligence and Analysis Lab
•
Assumption: There are k components: 푐푐푖푖푖푖=1 푘푘
•
Component 푐푐푖푖has an associated mean vector 휇휇푖푖
•
Each component generates data from a Gaussian with mean 휇휇푖푖 and covariance matrix Σ푖푖
Mixture of Gaussians
휇휇1
휇휇2
휇휇3
휇휇4
휇휇5
11
12. Network Intelligence and Analysis Lab
•
Represent model as linear combination of Gaussians
•
Probability density function of GMM
•
푝푝푥푥= 푘푘=1 퐾퐾 휋휋푘푘푁푁푥푥휇휇푘푘,Σ푘푘
•
푁푁푥푥휇휇푘푘,Σ푘푘=12휋휋푑푑/2Σ1/2exp{−12푥푥−휇휇⊤Σ−1푥푥−휇휇}
•
Which is called a mixture of Gaussian or Gaussian Mixture Model
•
Each Gaussian density is called component of the mixtures and has its own mean 휇휇푘푘and covariance Σ푘푘
•
The parameters are called mixing coefficients (Σ푘푘휋휋푘푘=1)
Gaussian Mixture Model
12
13. Network Intelligence and Analysis Lab
•
푝푝푥푥=Σ푘푘=1 퐾퐾휋휋푘푘푁푁푥푥휇휇푘푘,Σ푘푘, where Σ푘푘휋휋푘푘=1
•
Input:
•
The training set: 푥푥푖푖푖푖=1 푁푁
•
Number of clusters: k
•
Goal: model this data using mixture of Gaussians
•
Mixing coefficients 휋휋1,휋휋2,…,휋휋푘푘
•
Means and covariance: 휇휇1,휇휇2,…,휇휇푘푘;Σ1,Σ2,…,Σ푘푘
Clustering using Mixture Model
13
14. Network Intelligence and Analysis Lab
•
푝푝푥푥퐺퐺=푝푝푥푥휋휋1,휇휇1,…=Σ푖푖푝푝푥푥푐푐푖푖푝푝(푐푐푖푖)=Σ푖푖휋휋푖푖푁푁(푥푥|휇휇푖푖,Σ푖푖)
•
푝푝푥푥1,푥푥2,…,푥푥푁푁퐺퐺=Π푖푖푝푝(푥푥푖푖|퐺퐺)
•
The log likelihood function is given by
•
ln푝푝퐗퐗훑훑,훍훍,횺횺= 푛푛=1 푁푁 ln 푘푘=1 퐾퐾 휋휋푘푘푁푁퐱퐱퐧퐧훍훍퐤퐤,횺횺퐤퐤
•
Goal: Find parameter which maximize log-likelihood
•
Problem: Hard to compute maximum likelihood
•
Solution: use EM algorithm
Maximum Likelihood of GMM
14
15. Network Intelligence and Analysis Lab
•
EM algorithm is an iterative procedure for finding the MLE
•
An expectation (E) step creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters
•
A maximization (M) step computes parameters maximizing the expected log-likelihood found on the E step
•
These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.
•
EM always converges to one of local optimums
EM (Expectation Maximization) Algorithm
15
16. Network Intelligence and Analysis Lab
•
푎푎푎푎푎푎푎푎푎푛푛{푟푟푛푛푛푛,훍훍퐤퐤} 푛푛=1 푁푁 푘푘=1 퐾퐾 푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2
•
E-Step: minimize J with respect to the 푟푟푛푛푛, keeping 훍훍퐤퐤is fixed
•
푟푟푛푛푛=ቊ1푖푘푘=푎푎푎푎푎푎푎푎푎푛푛푗푗퐱퐱퐧퐧−훍훍퐤퐤 ퟐퟐ 0표표표표표표표표표표표표표표표
•
M-Step: minimize J with respect to the 훍훍퐤퐤, keeping 푟푟푛푛푛is fixed
•
훍훍퐤퐤=Σ푛푛푟푟푛푛푛푛퐱퐱퐧퐧 Σ푛푛푟푟푛푛푛푛
K-means revisit: EM and K-means
16
17. Network Intelligence and Analysis Lab
•
Let 푧푧푘푘is Bernoulli random variable with probability 휋휋푘푘
•
푝푝푧푧푘푘=1=휋휋푘푘where Σ푧푧푘푘=1and Σ휋휋푘푘=1
•
Because z use a 1-of-K representation, this distribution in the form
•
푝푝푧푧=Π푘푘=1 퐾퐾휋휋푘푘 푧푧푘
•
Similarly, the conditional distribution of x given a particular value for z is a Gaussian
•
푝푝푥푥푧푧=Π푘푘=1 퐾퐾푁푁푥푥휇휇푘푘,Σ푘푘 푧푧푘
Latent variable for GMM
17
18. Network Intelligence and Analysis Lab
•
The joint distribution is given by 푝푝푥푥,푧푧=푝푝푧푧푝푝(푥푥|푧푧)
•
푝푝푥푥=Σ푧푧푝푝푧푧푝푝(푥푥|푧푧)=Σ푘푘휋휋푘푘푁푁(푥푥|휇휇푘푘,Σ푘푘)
•
Thus the marginal distribution of x is a Gaussian mixture of the above form
•
Now, we are able to work with joint distribution instead of marginal distribution
•
Graphical representation of a GMMfor a set of N i.i.d. data points {푥푥푛푛} with corresponding latent variable{푧푧푛푛},where n=1,…,N
Latent variable for GMM
퐳퐳퐧퐧
푿푿풏풏
훑훑
흁흁
횺횺
N
18
19. Network Intelligence and Analysis Lab
•
Conditional probability of z given x
•
From Bayes’ theorem,
•
훾훾푧푧푘푘≡푝푝푧푧푘푘=1퐱퐱=푝푝푧푧푘=1푝푝퐱퐱푧푧푘푘=1Σ푗푗=1 퐾퐾푝푝푧푧푗푗=1푝푝퐱퐱푧푧푗푗=1= 휋휋푘푘푁푁퐱퐱훍훍퐤퐤,횺횺퐤퐤 Σ푗푗=1 퐾퐾휋휋푗푗푁푁(퐱퐱|훍훍퐣퐣,횺횺퐣퐣)
•
훾훾푧푧푘푘can also be viewed as the responsibility that component k takes for ‘explaining’ the observation x
EM for Gaussian Mixtures (E-step)
19
20. Network Intelligence and Analysis Lab
•
Likelihood function for GMM
•
ln푝푝퐗퐗훑훑,훍훍,횺횺= 푛푛=1 푁푁 ln 푘푘=1 퐾퐾 휋휋푘푘푁푁퐱퐱퐧퐧훍훍퐤퐤,횺횺퐤퐤
•
Setting the derivatives of log likelihood with respect to the means 휇휇푘푘of the Gaussian components to zero, we obtain
•
휇휇푘푘= 1N푘푘 푛푛=1 푁푁 훾훾푧푧푛푛푛퐱퐱퐧퐧 where, 푁푁푘푘=Σ푛푛=1 푁푁훾훾(푧푧푛푛푛)
EM for Gaussian Mixtures (M-step)
20
21. Network Intelligence and Analysis Lab
•
Setting the derivatives of likelihood with respect to the Σ푘푘to zero, we obtain
•
횺횺풌풌= 1 푁푁푘푘 푛푛=1 푁푁 훾훾푧푧푛푛푛퐱퐱퐧퐧−휇휇푘푘퐱퐱퐧퐧−휇휇푘푘 ⊤
•
Maximize likelihood with respect to the mixing coefficient 휋휋by using a Lagrange multiplier, we obtain
•
ln푝푝퐗퐗훑훑,훍훍,횺횺+휆휆(Σ푘푘=1 퐾퐾휋휋푘푘−1)
•
휋휋푘푘=푁푁푘푁푁
EM for Gaussian Mixtures (M-step)
21
22. Network Intelligence and Analysis Lab
•
휇휇푘푘,Σ푘푘,휋휋푘푘do not constitute a closed-form solution for the parameters of the mixture model because the responsibility 훾훾푧푧푛푛푛depend on those parameters in a complex way
•
훾훾(푧푧푛푛푛)=휋휋푘푁푁퐱퐱훍훍퐤퐤,횺횺퐤퐤 Σ푗푗=1 퐾퐾휋휋푗푗푁푁(퐱퐱|훍훍퐣퐣,횺횺퐣퐣)
•
In EM algorithm for GMM, 훾훾(푧푧푛푛푛)and parameters are iteratively optimized
•
In E step, responsibilities or the posterior probabilities are evaluated by current values for the parameters
•
In M step, re-estimate the means, covariances, and mixing coefficients using previous results
EM for Gaussian Mixtures
22
23. Network Intelligence and Analysis Lab
•
Initialize the means 휇휇푘푘, covariancesΣ푘푘and mixing coefficient 휋휋푘푘, and evaluate the initial value of the log likelihood
•
E step: Evaluate the responsibilities using the current parameter
•
훾훾(푧푧푛푛푛)= 휋휋푘푘푁푁퐱퐱훍훍퐤퐤,횺횺퐤퐤 Σ푗푗=1 퐾퐾휋휋푗푗푁푁(퐱퐱|훍훍퐣퐣,횺횺퐣퐣)
•
M step: Re-estimate parameters using the current responsibilities
•
휇휇푘푘 푛푛푛푛푛푛=1N푘Σ푛푛=1 푁푁훾훾푧푧푛푛푛퐱퐱퐧퐧
•
횺횺풌풌 풏풏풏풏풏풏=1 푁푁푘Σ푛푛=1 푁푁훾훾푧푧푛푛푛퐱퐱퐧퐧−휇휇푘푘퐱퐱퐧퐧−휇휇푘푘 ⊤
•
휋휋푘푘 푛푛푛푛푛푛=푁푁푘푁푁
•
푁푁푘푘=Σ푛푛=1 푁푁훾훾(푧푧푛푛푛)
•
Repeat E step and M step until converge
EM for Gaussian Mixtures
23
24. Network Intelligence and Analysis Lab
•
We can derive the K-means algorithm as a particular limit of EM for Gaussian Mixture Model
•
Consider a Gaussian mixture model with covariance matrices are given by 휀휀퐼퐼, where 휀휀is a variance parameter and I is identity
•
If we consider the limit휀휀→0, log likelihood of GMM becomes
•
퐸퐸푧푧ln푝푝푋푋,푍푍휇휇,Σ,휋휋→−12=Σ푛푛Σ푘푘푟푟푛푛푛퐱퐱퐧퐧−훍훍퐤퐤 2+퐶퐶
•
Thus, we see that in this limit, maximizing the expected complete- data log likelihood is equivalent to K-means algorithm
Relationship between K-means algorithm and GMM
24