This document provides an introduction to machine learning and neural networks. It discusses key concepts like supervised vs unsupervised learning, classification vs regression problems, and performance evaluation metrics. It also covers foundational machine learning techniques like k-nearest neighbors for classification and regression. Descriptive statistics concepts like mean, variance, correlation and covariance are introduced. Finally, it discusses visualizing data through scatter plots and histograms.
This presentation describes two major papers in multi-variate time-series using deep neural networks. The first paper, DeepAR was developed at Amazon to deal with forecasting of millions of items where the same model can be applied to millions of products. DeepAR is implemented as a built-in algorithm of Amazon SageMaker. Code example is provided.
The second paper, Long- and Short-Term Temporal Patterns with Deep Neural Networks is developed at CMU and introduces a novel way to detect both short term and long term seasonality in data through introduction of skip-rnn.
A Gluon implementation of the paper is provided in the presentation.
Inroduction to Perceptron and how it is used in Machine Learning and Artificial Neural Network.
This presentation is prepared by Zaid Al-husseini, as a lectur for third stage of undergraduate students in Softwrae department - faculity of IT - University of Babylon, Iraq.
It is publicly availabe for the beginners to learn in theory and mathmatically how the Perceptron is working.
Notice: the slides are not detailed. And need a teacher to explain them deeply.
Artificial Intelligence Course | AI Tutorial For Beginners | Artificial Intel...Simplilearn
This Artificial Intelligence presentation will help you understand what is Artificial Intelligence, types of Artificial Intelligence, ways of achieving Artificial Intelligence and applications of Artificial Intelligence. In the end, we will also implement a use case on TensorFlow in which we will predict whether a person has diabetes or not. Artificial Intelligence is a method of making a computer, a computer-controlled robot or a software think intelligently in a manner similar to the human mind. AI is accomplished by studying the patterns of the human brain and by analyzing the cognitive process. Artificial Intelligence is emerging as the next big thing in the technology field. Organizations are adopting AI and budgeting for certified professionals in the field, thus the demand for trained and certified professionals in AI is increasing. As this new field continues to grow, it will have an impact on everyday life and lead to considerable implications for many industries. Now, let us deep dive into the AI tutorial video and understand what is this Artificial Intelligence all about and how it can impact human life.
The topics covered in this Artificial Intelligence presentation are as follows:
1. What is Artificial intelligence?
2. Types of Artificial intelligence
3. Ways of achieving artificial intelligence
4. Applications of Artificial intelligence
5. Use case - Predicting if a person has diabetes or not
Simplilearn’s Artificial Intelligence course provides training in the skills required for a career in AI. You will master TensorFlow, Machine Learning and other AI concepts, plus the programming languages needed to design intelligent agents, deep learning algorithms & advanced artificial neural networks that use predictive analytics to solve real-time decision-making problems without explicit programming.
Why learn Artificial Intelligence?
The current and future demand for AI engineers is staggering. The New York Times reports a candidate shortage for certified AI Engineers, with fewer than 10,000 qualified people in the world to fill these jobs, which according to Paysa earn an average salary of $172,000 per year in the U.S. (or Rs.17 lakhs to Rs. 25 lakhs in India) for engineers with the required skills.
Those who complete the course will be able to:
1. Master the concepts of supervised and unsupervised learning
2. Gain practical mastery over principles, algorithms, and applications of machine learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of machine learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, Naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
Comprehend the theoretic
Learn more at: https://www.simplilearn.com
This knolx is about an introduction to machine learning, wherein we see the basics of various different algorithms. This knolx isn't a complete intro to ML but can be a good starting point for anyone who wants to start in ML. In the end, we will take a look at the demo wherein we will analyze the FIFA dataset going through the understanding of various data analysis techniques and use an ML algorithm to derive 5 players that are similar to each other.
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsSlideTeam
"You can download this product from SlideTeam.net"
Machine Learning ML Overview Algorithms Use Cases and Applications is for the mid level managers giving information about Machine Learning, how Machine Learning works, Machine Learning algorithms and its use cases. You can also learn the difference between Machine learning vs Traditional programming to understand how to implement machine learning in a better way for business growth. https://bit.ly/2ZaVSG9
This presentation describes two major papers in multi-variate time-series using deep neural networks. The first paper, DeepAR was developed at Amazon to deal with forecasting of millions of items where the same model can be applied to millions of products. DeepAR is implemented as a built-in algorithm of Amazon SageMaker. Code example is provided.
The second paper, Long- and Short-Term Temporal Patterns with Deep Neural Networks is developed at CMU and introduces a novel way to detect both short term and long term seasonality in data through introduction of skip-rnn.
A Gluon implementation of the paper is provided in the presentation.
Inroduction to Perceptron and how it is used in Machine Learning and Artificial Neural Network.
This presentation is prepared by Zaid Al-husseini, as a lectur for third stage of undergraduate students in Softwrae department - faculity of IT - University of Babylon, Iraq.
It is publicly availabe for the beginners to learn in theory and mathmatically how the Perceptron is working.
Notice: the slides are not detailed. And need a teacher to explain them deeply.
Artificial Intelligence Course | AI Tutorial For Beginners | Artificial Intel...Simplilearn
This Artificial Intelligence presentation will help you understand what is Artificial Intelligence, types of Artificial Intelligence, ways of achieving Artificial Intelligence and applications of Artificial Intelligence. In the end, we will also implement a use case on TensorFlow in which we will predict whether a person has diabetes or not. Artificial Intelligence is a method of making a computer, a computer-controlled robot or a software think intelligently in a manner similar to the human mind. AI is accomplished by studying the patterns of the human brain and by analyzing the cognitive process. Artificial Intelligence is emerging as the next big thing in the technology field. Organizations are adopting AI and budgeting for certified professionals in the field, thus the demand for trained and certified professionals in AI is increasing. As this new field continues to grow, it will have an impact on everyday life and lead to considerable implications for many industries. Now, let us deep dive into the AI tutorial video and understand what is this Artificial Intelligence all about and how it can impact human life.
The topics covered in this Artificial Intelligence presentation are as follows:
1. What is Artificial intelligence?
2. Types of Artificial intelligence
3. Ways of achieving artificial intelligence
4. Applications of Artificial intelligence
5. Use case - Predicting if a person has diabetes or not
Simplilearn’s Artificial Intelligence course provides training in the skills required for a career in AI. You will master TensorFlow, Machine Learning and other AI concepts, plus the programming languages needed to design intelligent agents, deep learning algorithms & advanced artificial neural networks that use predictive analytics to solve real-time decision-making problems without explicit programming.
Why learn Artificial Intelligence?
The current and future demand for AI engineers is staggering. The New York Times reports a candidate shortage for certified AI Engineers, with fewer than 10,000 qualified people in the world to fill these jobs, which according to Paysa earn an average salary of $172,000 per year in the U.S. (or Rs.17 lakhs to Rs. 25 lakhs in India) for engineers with the required skills.
Those who complete the course will be able to:
1. Master the concepts of supervised and unsupervised learning
2. Gain practical mastery over principles, algorithms, and applications of machine learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of machine learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, Naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
Comprehend the theoretic
Learn more at: https://www.simplilearn.com
This knolx is about an introduction to machine learning, wherein we see the basics of various different algorithms. This knolx isn't a complete intro to ML but can be a good starting point for anyone who wants to start in ML. In the end, we will take a look at the demo wherein we will analyze the FIFA dataset going through the understanding of various data analysis techniques and use an ML algorithm to derive 5 players that are similar to each other.
Machine Learning Ml Overview Algorithms Use Cases And ApplicationsSlideTeam
"You can download this product from SlideTeam.net"
Machine Learning ML Overview Algorithms Use Cases and Applications is for the mid level managers giving information about Machine Learning, how Machine Learning works, Machine Learning algorithms and its use cases. You can also learn the difference between Machine learning vs Traditional programming to understand how to implement machine learning in a better way for business growth. https://bit.ly/2ZaVSG9
Introduction to machine learning. Basics of machine learning. Overview of machine learning. Linear regression. logistic regression. cost function. Gradient descent. sensitivity, specificity. model selection.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
Fundamental, An Introduction to Neural NetworksNelson Piedra
An introduction to Neural Networks, eight edition, 1996
Authors: Ben Krose, Faculty of Mathematics & Computer Science, University of Amsterdam. Patrick wan der Smagt, Institute of Robotics and Systems Dynamics, German Aerospace Research Establishment
Keynote: Nelson Piedra, Computer Sciences School - Advanced Tech, Technical University of Loja UTPL, Ecuador.
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...Edureka!
Machine Learning Training with Python: https://www.edureka.co/python )
This Edureka Machine Learning tutorial (Machine Learning Tutorial with Python Blog: https://goo.gl/fe7ykh ) on "AI vs Machine Learning vs Deep Learning" talks about the differences and relationship between AL, Machine Learning and Deep Learning. Below are the topics covered in this tutorial:
1. AI vs Machine Learning vs Deep Learning
2. What is Artificial Intelligence?
3. Example of Artificial Intelligence
4. What is Machine Learning?
5. Example of Machine Learning
6. What is Deep Learning?
7. Example of Deep Learning
8. Machine Learning vs Deep Learning
Machine Learning Tutorial Playlist: https://goo.gl/UxjTxm
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
This presentation about backpropagation and gradient descent will cover the basics of how backpropagation and gradient descent plays a role in training neural networks - using an example on how to recognize the handwritten digits using a neural network. After predicting the results, you will see how to train the network using backpropagation to obtain the results with high accuracy. Backpropagation is the process of updating the parameters of a network to reduce the error in prediction. You will also understand how to calculate the loss function to measure the error in the model. Finally, you will see with the help of a graph, how to find the minimum of a function using gradient descent. Now, let’s get started with learning backpropagation and gradient descent in neural networks.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning, and artificial intelligence
Learn more at https://www.simplilearn.com/deep-learning-course-with-tensorflow-training
Introduction to machine learning. Basics of machine learning. Overview of machine learning. Linear regression. logistic regression. cost function. Gradient descent. sensitivity, specificity. model selection.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
Fundamental, An Introduction to Neural NetworksNelson Piedra
An introduction to Neural Networks, eight edition, 1996
Authors: Ben Krose, Faculty of Mathematics & Computer Science, University of Amsterdam. Patrick wan der Smagt, Institute of Robotics and Systems Dynamics, German Aerospace Research Establishment
Keynote: Nelson Piedra, Computer Sciences School - Advanced Tech, Technical University of Loja UTPL, Ecuador.
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...Edureka!
Machine Learning Training with Python: https://www.edureka.co/python )
This Edureka Machine Learning tutorial (Machine Learning Tutorial with Python Blog: https://goo.gl/fe7ykh ) on "AI vs Machine Learning vs Deep Learning" talks about the differences and relationship between AL, Machine Learning and Deep Learning. Below are the topics covered in this tutorial:
1. AI vs Machine Learning vs Deep Learning
2. What is Artificial Intelligence?
3. Example of Artificial Intelligence
4. What is Machine Learning?
5. Example of Machine Learning
6. What is Deep Learning?
7. Example of Deep Learning
8. Machine Learning vs Deep Learning
Machine Learning Tutorial Playlist: https://goo.gl/UxjTxm
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
This presentation about backpropagation and gradient descent will cover the basics of how backpropagation and gradient descent plays a role in training neural networks - using an example on how to recognize the handwritten digits using a neural network. After predicting the results, you will see how to train the network using backpropagation to obtain the results with high accuracy. Backpropagation is the process of updating the parameters of a network to reduce the error in prediction. You will also understand how to calculate the loss function to measure the error in the model. Finally, you will see with the help of a graph, how to find the minimum of a function using gradient descent. Now, let’s get started with learning backpropagation and gradient descent in neural networks.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning, and artificial intelligence
Learn more at https://www.simplilearn.com/deep-learning-course-with-tensorflow-training
Brief introduction of neural network including-
1. Fitting Tool
2. Clustering data with a self-organising map
3. Pattern Recognition Tool
4. Time Series Toolbox
Machine learning is the subfield of computer science that, according to Arthur Samuel in 1959, gives "computers the ability to learn without being explicitly programmed.Evolved from the study of pattern recognition and computational learning theory in artificial intelligence,machine learning explores the study and construction of algorithms that can learn from and make predictions on data – such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions,:2 through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with good performance is difficult or unfeasible; example applications include email filtering, detection of network intruders or malicious insiders working towards a data breach,Optical character recognition (OCR),learning to rank and computer vision.
Artificial intelligence based pattern recognition is
one of the most important tools in process control to identify
process problems. The objective of this study was to
evaluate the relative performance of a feature-based
Recognizer compared with the raw data-based recognizer.
The study focused on recognition of seven commonly
researched patterns plotted on the quality chart. The
artificial intelligence based pattern recognizer trained using
the three selected statistical features resulted in significantly
better performance compared with the raw data-based
recognizer.
With these components in place, we present the Data
Science Machine — an automated system for generating
predictive models from raw data. It starts with a relational
database and automatically generates features to be used
for predictive modeling.
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...IAEME Publication
This paper presents an approach based on applying an aggregated predictor formed by multiple versions of a multilayer neural network with a back-propagation optimization algorithm for helping the engineer to get a list of the most appropriate well-test interpretation models for a given set of pressure/ production data. The proposed method consists of three stages: (1) data decorrelation through principal component analysis to reduce the covariance between the variables and the dimension of the input layer in the artificial neural network, (2) bootstrap replicates of the learning set where the data is repeatedly sampled with a random split of the data into train sets and using these as new learning sets, and (3) automatic reservoir model identification through aggregated predictor formed by a plurality vote when predicting a new class. This method is described in detail to ensure successful replication of results. The required training and test dataset were generated by using analytical solution models. In our case, there were used 600 samples: 300 for training, 100 for cross-validation, and 200 for testing. Different network structures were tested during this study to arrive at optimum network design. We notice that the single net methodology always brings about confusion in selecting the correct model even though the training results for the constructed networks are close to 1. We notice also that the principal component analysis is an effective strategy in reducing the number of input features, simplifying the network structure, and lowering the training time of the ANN. The results obtained show that the proposed model provides better performance when predicting new data with a coefficient of correlation approximately equal to 95% Compared to a previous approach 80%, the combination of the PCA and ANN is more stable and determine the more accurate results with lesser computational complexity than was feasible previously. Clearly, the aggregated predictor is more stable and shows less bad classes compared to the previous approach.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSveerababupersonal22
It consists of cw radar and fmcw radar ,range measurement,if amplifier and fmcw altimeterThe CW radar operates using continuous wave transmission, while the FMCW radar employs frequency-modulated continuous wave technology. Range measurement is a crucial aspect of radar systems, providing information about the distance to a target. The IF amplifier plays a key role in signal processing, amplifying intermediate frequency signals for further analysis. The FMCW altimeter utilizes frequency-modulated continuous wave technology to accurately measure altitude above a reference point.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
4. Ch.1 ML Introduction
Objective: get “feel” and terminologies
1. What is machine learning? Concept and applications
2. What problems can ML solve?
Classification, regression and clustering
Supervised and unsupervised learning
3. Key elements of ML
Data, Model and Cost
4. Design steps and issues in performance evaluation
2019-09-26 4Machine learning and artificial neural network
5. Machine learning: Introduction
Major applications
Pattern classification: Character/Speech recognition
Object detection and tracking
Time-series prediction (Stock price/market prediction,
weather forecast)
Sentence completion and language translation
… and much more
Problems in machine learning
Classification
Regression
Clustering
2019-09-26 5Machine learning and artificial neural network
6. Machine learning: Introduction
Related fields
2019-09-26 6Machine learning and artificial neural network
Machine learning
Probability and
statistics
Data science
Cognitive
science
Artificial
Intelligence
Computer science
Big data
Data mining
Linguistics
Psychology, neuro-science
Neural Network
7. Machine learning: Introduction
Elements of machine learning in classification and
regression problem
Prediction model ( ) with parameters ( )
Data (observations and their target values )
Cost(loss)/Objective function to minimize/maximize
Algorithm to efficiently obtain the optimal or a good
solution
2019-09-26 7Machine learning and artificial neural network
Data: 𝒊 𝟏
𝑵
Model with parameters:
Cost/loss:
Algorithm to solve ∗
𝜽
8. Machine learning: Introduction
Machine learning process
2019-09-26 8Machine learning and artificial neural network
Existing data
𝒊 𝟏
𝑵
Machine learning
Algorithm
∗
𝜽
Model
(with parameters)
∗
New data Prediction
∗
9. Machine learning: Introduction
Classification and regression
2019-09-26 9Machine learning and artificial neural network
cat (Smiling)
X
y
1 2 3 4 5 6
1
2
3 X = 2.5, y=?
X = 6.0, y=?
Existing dataNew data
10. Machine learning: Introduction
Given an observation (which can be a vector, a
matrix (image) or a tensor)
Classification determines its class among a set of classes
Regression estimates/predicts unobserved variables
Regression can be a prediction of future trend or
interpolation of some missing information
Classification vs. regression
In classification, is a discrete, categorical value drawn
from a finite set
In regression, is a numerical value
2019-09-26 10Machine learning and artificial neural network
11. Machine learning: Introduction
Machine learning is all about to find and
How to find the best or, at least, a good ?
Given , how to find the best or, at least, a good ?
The best or a good for what and in what sense ?
Why do we need pre-collected data for learning/training ?
2019-09-26 11Machine learning and artificial neural network
Data: 𝒊 𝟏
𝑵
Model with parameters:
Cost/loss:
Algorithm to solve ∗
𝜽
12. Machine learning: Introduction
Some terminologies
Learning/Training/Model fitting: process to find the model
parameters ( ) that best fit to given data in terms of the
predefined cost/objective
Supervised learning: target values ( ) are provided
• Classification, regression
Unsupervised learning: no target values provided
• Clustering
2019-09-26 12Machine learning and artificial neural network
Data: 𝒊 𝟏
𝑵
Model with parameters:
Cost/loss:
Algorithm to solve ∗
𝜽
13. Machine learning: Introduction
Design steps (supervised learning)
1. Define the function you want to implement (define input
and output )
2. Design your model , intuitively and smartly
3. Collect data and curate them to set
4. Train the model to get ∗
5. Use ∗
to evaluate the performance
6. If satisfied, you are done! Otherwise, go to step 2 (skip 3).
Step 2 requires strong/some mathematical background
Step 3 is typically time-consuming and sometimes requires
domain expertise (e.g. for medical application)
2019-09-26 13Machine learning and artificial neural network
14. Machine learning: Introduction
Design steps for beginner (supervised learning)
1. Choose a function you want to implement (input/output
formats are pre-defined)
2. Search for some open SW packages to choose/construct
an appropriate model and try to modify slightly
3. Download dataset ( , ) from the internet
4. Use the packages to train the model to get ∗
5. Use ∗ to evaluate the performance
6. If satisfied, you are done! Otherwise, go to step 2 (skip 3).
2019-09-26 14Machine learning and artificial neural network
15. Machine learning: Introduction
Parameters and hyper parameters
Most of the models have some hyper-parameters that
are pre-defined before training
Must be optimized for performance, computing costs …
may need grid search to find the best combination of
hyper parameters.
2019-09-26 15Machine learning and artificial neural network
16. Machine learning: Introduction
Performance evaluation of classifier/regressor
Must consider “generalization error”
Typical performance measures
Classification: Accuracy
Regression: Mean Squared Error , R2 measure
2019-09-26 16Machine learning and artificial neural network
그림 1.1 분류기/추정기의 학습과 테스트
17. Machine learning: Introduction
Clustering
No target values for observations
Objective is to divide data into a set of groups based on
some similarity measures
Need to devise procedures to efficiently group data
Data (distribution) visualization may help
Once clustered, the data can be used for classification
2019-09-26 17Machine learning and artificial neural network
18. Machine learning: Introduction
Two typical similarity measures
Euclidian distance:
Correlation:
𝒙 𝒙
𝒙 𝒙
Need to consider symmetricity and their ranges
Note
L-p norm of a vector:
/
Default value of p = 2
Schwartz’s inequality:
2019-09-26 18Machine learning and artificial neural network
19. Machine learning: Introduction
Simplest classifier: k nearest neighbor (knn) classifier
Training data 𝒊 𝟏
𝑵
used as templates
Given new input data , it determines its class as follows
1. Compute (may use other similarity measure)
2. Select k candidates nearest to
3. Use majority vote to determine the class of
2019-09-26 19Machine learning and artificial neural network
Existing data
𝒊 𝟏
𝑵
knn classifierNew data Prediction
20. Machine learning: Introduction
k nearest neighbor (knn) as regressor
Training data 𝒊 𝟏
𝑵
used as templates
Given new input data , it determines its class as follows
1. Compute (may use other similarity measure)
2. Select k candidates nearest to
3. Take average of k candidates to determine the estimates
2019-09-26 20Machine learning and artificial neural network
Existing data
𝒊 𝟏
𝑵
knn regressorNew data Prediction
21. Machine Learning and Neural Network
Ch.2: Data and descriptive statistics
Seokhyun Yoon, Electronics Eng., Dankook Uinversity
22. Ch.2 Data and descriptive statistics
Topics
1. Data: types and representation
2. Descriptive statistics
Scatter plot and histogram
Mean, correlation and covariance
2019-09-26 22Machine learning and artificial neural network
23. Data and descriptive stat.
Terminologies and notation
Observation/sample/feature vector (for now, assume that
it is a vector).
Target value : desired value for a sample
In supervised learning, and should be paired ( , )
Collection of data:
2019-09-26 23Machine learning and artificial neural network
Each column is a sample
each row is a feature
24. Data and descriptive stat.
Two types of data:
Categorical
Numerical
Categorical value is typically mapped to an integer
to make it suitable for computation
ex: T 1, F 0
Blood type: O 0, A 1, B 2, AB 3
2019-09-26 24Machine learning and artificial neural network
25. Data and descriptive stat.
An example of multivariate (다변량) data
Data consisting of 20 samples
Each column is one sample with 4 features, (Group, English,
Math, Science score) call it feature vector
where Group is categorical and others are numerical
2019-09-26 25Machine learning and artificial neural network
sid 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Ave. SD.
Group A A A A A A A A A A B B B B B B B B B B
English 77 81 74 89 78 77 75 84 79 82 69 76 74 67 74 71 67 67 70 69 75 6.09
Math 72 67 74 64 71 67 72 68 75 70 83 78 76 82 80 77 83 84 80 80 75 6.09
Science 75 68 72 68 74 68 75 73 79 72 78 75 72 74 77 70 75 79 77 76 74 3.47
a sample/observation
26. Data and descriptive stat.
Example problems
Classification:
Given , determine
Regression:
Given , estimate
2019-09-26 26Machine learning and artificial neural network
sid 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Ave. SD.
Group A A A A A A A A A A B B B B B B B B B B
English 77 81 74 89 78 77 75 84 79 82 69 76 74 67 74 71 67 67 70 69 75 6.09
Math 72 67 74 64 71 67 72 68 75 70 83 78 76 82 80 77 83 84 80 80 75 6.09
Science 75 68 72 68 74 68 75 73 79 72 78 75 72 74 77 70 75 79 77 76 74 3.47
a sample/observation
27. Data and descriptive stat.
Data visualization: scatter plot and histogram
2019-09-26 27Machine learning and artificial neural network
Empirical (Probability)
density gives us lots of
information for the
design and performance
of classifier, regressor
and clustering algorithm
One or two dimensional
(bivariate) data is easy
to visualize
While, more than 2D is
hard
Pairwise scatter plot is
affordable for small M
28. Data and descriptive stat.
Problems in machine learning
2019-09-26 28Machine learning and artificial neural network
Classification
Regression Clustering
Few prob. distribution
models can be successfully
applied to practical dataset
That’s why we resort to
machine learning based on
a collection of samples
29. Data and descriptive stat.
Mean, Correlation and Covariance
Consider dataset N samples and M features
(Per-feature) mean:
(Per-feature) variance:
is the standard deviation
’s and ‘s can be collectively represented as a vector
2019-09-26 29Machine learning and artificial neural network
30. Data and descriptive stat.
Mean, Correlation and Covariance
Dataset N samples and M features
Correlation (for a pair of features):
Covariance (for a pair of features):
, (symmetric)
’s and ’s can be collectively represented as matrices
2019-09-26 30Machine learning and artificial neural network
31. Data and descriptive stat.
Mean, Correlation matrix and Covariance matrix
Consider dataset N samples and M features
Mean (vector): 𝑿
Correlation matrix: 𝑿𝑿
Covariance matrix: 𝑿𝑿 𝑿𝑿 𝑿
Cross correlation: 𝑿𝒚
Cross covariance: 𝑿𝒚 𝑿𝒚 𝑿 𝒚
2019-09-26 31Machine learning and artificial neural network
Size:
Size:
32. Data and descriptive stat.
Properties of (and )
𝑿𝑿
𝑻
𝑿𝑿 (symmetric)
𝑿𝑿 is non-negative definite, such that, for any vector ,
𝑿𝑿
The eigen values are all non-negative and their eigen vectors
form an orthonormal basis, i.e., with eigen decomposition
𝑿𝑿 ,
diagonal elements of are all non-negative real and
𝑿𝑿
If (the number of samples is less than the number of
features), then 𝑿𝑿 has at most non-zero eigen values. (all
others are zero). In this case, 𝑿𝑿 is not invertible
These properties also hold for 𝑿𝑿
2019-09-26 32Machine learning and artificial neural network
33. Data and descriptive stat.
For the two data matrices
and ,
Find 𝑿
Find 𝑿𝑿 and 𝑿𝑿
Check if 𝑿𝑿 and 𝑿𝑿 satisfies the properties in the
previous slide.
2019-09-26 33Machine learning and artificial neural network
34. Data and descriptive stat.
Example (문제 2.2)
Find the correlation and covariance between
• English and math
• English and science
• Math and science
Find 𝑿𝑿 and 𝑿𝑿
Check if 𝑿𝑿 and 𝑿𝑿 satisfies the properties in the
previous slide.
2019-09-26 34Machine learning and artificial neural network
sid 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Ave. SD.
English 77 81 74 89 78 77 75 84 79 82 69 76 74 67 74 71 67 67 70 69 75 6.09
Math 72 67 74 64 71 67 72 68 75 70 83 78 76 82 80 77 83 84 80 80 75 6.09
Science 75 68 72 68 74 68 75 73 79 72 78 75 72 74 77 70 75 79 77 76 74 3.47
36. Machine Learning and Neural Network
Ch.3: Multi-variate Gaussian PDF
and linear transform
Seokhyun Yoon, Electronics Eng., Dankook Uinversity
37. Ch.3 Multivariate Gaussian PDF & linear transform
Topics
1. Multi-variate Gaussian PDF
Pearson’s correlation coefficient
2. Linear transformation
Principal axes transform and whitening
3. Principal component analysis (PCA)
2019-09-26 37Machine learning and artificial neural network
38. Multivariate Gaussian PDF: definition
Definition of multivariate Gaussian (Normal) PDF
Consider a Gaussian random vector 𝑻
The PDF of is defined, in general, as
/ 𝑪 /
𝑻
where
is mean,
is covariance matrix,
Note
Quadratic form (2차형식) 𝑻
is a scalar ( , ×
)
Mahalanobis distance: 𝑻
(Symmetric)
2019-09-26 38Machine learning and artificial neural network
39. Multivariate Gaussian PDF
3 cases of bivariate Gaussian (Normal) PDF
Case 1: 𝝁 =
0
5
, 𝑪 =
9 0
0 9
, Case 2: 𝝁 =
0
5
, 𝑪 =
1 0
0 16
Case 3: 𝝁 =
0
5
, 𝑪 =
9 −10
−10 16
2019-09-26 39Machine learning and artificial neural network
Mean is just a
“translation”
Contour plot
40. Multivariate Gaussian PDF
Let’s take a closer look
“Contour” can be obtained from 𝑻
Suppose that for simplicity 𝑻
Suppose also that 𝟏 𝟏
Then, we have
𝑧
𝜎
− 2𝜌
𝑧
𝜎
𝑧
𝜎
+
𝑧
𝜎
= 𝑐′(1 − 𝜌 )
where is Pearson correlation coefficient defined as
satisfying
We say
and are uncorrelated if
and has perfect correlation if
2019-09-26 40Machine learning and artificial neural network
This is an ellipse
41. Multivariate Gaussian PDF
Examples
Pearson correlation coefficient between two random
variables (two features) and is defined as
satisfying
We say that
and are uncorrelated if
and has perfect correlation if
2019-09-26 41Machine learning and artificial neural network
42. Data and descriptive stat.
What can you see?
2019-09-26 42Machine learning and artificial neural network
Are Math and English
scores correlated ?
What can you say
about Math and English
score? Set up your
hypothesis.
Use the figure in the
previous page to
roughly estimate the
Pearson correlation
coefficient.
43. Multivariate Gaussian PDF (참고사항)
Marginalization of an M-variate Gaussian PDF is also
a Gaussian PDF with (M-1)-variates
𝒊 𝒊
Successive marginalization gives us a univariate
Gaussian PDF
2019-09-26 43Machine learning and artificial neural network
44. Linear transform
Definition of a linear transformation
For any matrix of size (KxM), linear transform of a vector
of size (Mx1) is defined as
Linear transform is a projection of onto the row space of
Linear transform of a Gaussian random vector
Suppose that be a Gaussian RV with mean and cov. , i.e.,
Then, for any matrix , the linear transform is also
Gaussian with mean and covariance , i.e.,
Try to verify using the def. of mean and covariance in Ch.2
2019-09-26 44Machine learning and artificial neural network
45. Linear transform
Principal axes transformation and Whitening
Suppose that (eigen-decomposition of ) ,
: diagonal matrix with ( th eigen value)
: eigen basis ( th column is the eigen vector for )
(Principal axes transform) The linear transform by using
as transform matrix, is Gaussian with PDF
(Whitening) By using / as transform matrix,
/ is also Gaussian with PDF
/
2019-09-26 45Machine learning and artificial neural network
46. Principal Component Analysis (PCA)
Principal component analysis (PCA)
With
PCA uses several (typically two) eigen vectors corresponding
to the largest eigen values as projection matrix.
Let
• ( , ) be the two largest eigen values
• ( , ) be the corresponding eigen vectors
We use as transform matrix
The distribution of can be easily visualized in a low
dimensional (e.g., 2D) space.
If
𝑪
, contains most of the information on , i.e.,
2019-09-26 46Machine learning and artificial neural network
47. Data (distribution) visualization
Pairwise scatter plot is NOT affordable for large M
2019-09-26 47Machine learning and artificial neural network
M = 4 M = 64 (showing only 10 features)
48. Data (distribution) visualization
2019-09-26 48Machine learning and artificial neural network
Pair-wise scatter plots of Iris dataset
(3 classes, 4 dimensional feature)
2 dimensional projection provides
better representation of clusters
and similarity between feature
49. Data (distribution) visualization
2019-09-26 49Machine learning and artificial neural network
Pair-wise scatter plots of Digits dataset
(10 classes, 64 dimensional feature)
Showing only first 10x10
2 dimensional projection provides
better representation of clusters
and similarity between feature
51. Machine Learning and Neural Network
Appendix A: Optimization I
Seokhyun Yoon, Electronics Eng., Dankook Uinversity
52. Appendix: Optimization
Topics
1. Optimization I: Unconstrained optimization
Definition of optimization problem
Quadratic programming problem
Maximum likelihood estimation as an optimization problem
2. Optimization II: Iterative solutions
Gradient descent and stochastic gradient descent
Coordinate descent
Newton-Raphson method
3. Optimization III: Constrained optimization
Definition
Lagrange multiplier and Rayleigh quotient optimization
Duality in constrained optimization and KKT condition
2019-09-26 52Machine learning and artificial neural network
53. Unconstrained optimization
Definitions of unconstrained optimization
Minimization:
𝜽∈ℝ
or ∗
𝜽
Minimization:
𝜽∈ℝ
or ∗
𝜽
where is a cost/objective function.
Convex optimization
If is a convex function, the solution can be obtained by
solving (as there is only one minimum (maximum)
𝜽
where 𝜽 is the gradient operator
2019-09-26 53Machine learning and artificial neural network
54. Unconstrained optimization: QP problem
Quadratic programming (QP) problem
QP problem is a special case of convex optimization problem
is a quadratic function of , i.e.,
𝜽
Since is a convex function, the solution is given by solving
Solution:
∗
𝜽
(if is invertible)
2019-09-26 54Machine learning and artificial neural network
55. Unconstrained optimization: Gradient formula
Gradient operators
For vector : 𝜽 𝜽
For matrix : 𝑨 𝑨
Gradient formula
𝜽 𝜽
𝜽
𝑨
𝑨
𝟏
2019-09-26 55Machine learning and artificial neural network
56. Unconstrained optimization: Gradient formula
Example (문제 A.1):
minimize , i.e., find ∗ ∗
that minimize
and find also the minimum value ∗ ∗
Express in vector-matrix form, i.e.
Use the vector-matrix form to minimize
(use the gradient formula)
Repeat for
2019-09-26 56Machine learning and artificial neural network
57. Maximum likelihood estimation
Given
Data samples:
PDF model: with unknown parameter
We want to find that maximize
likelihood of :
Or log-likelihood:
It is a maximization problem
∗
𝜽∈ℝ 𝜽∈ℝ
2019-09-26 57Machine learning and artificial neural network
58. MLE example: Bernoulli trial
Given
Data samples: , where
PDF model: ( )
with
Parameter to estimate:
Likelihood function
Solution:
∗
2019-09-26 58Machine learning and artificial neural network
Try to verify this by maximizing
the likelihood or log-likelihood function.
where k is the number of 1’s
occurred in N trials
59. MLE example: Multi-variate Gaussian PDF (선택)
Given
Data samples: , where
PDF model:
where : mean, : covariance matrix parameters to estimate
Log-Likelihood function
with 𝑻
Solution:
𝟏
𝑵
𝑵
𝟏
𝑵
𝑻 𝟏
𝑵
2019-09-26 59Machine learning and artificial neural network
Try to verify this using
gradient formula.
61. Roadmap
2019-09-26 61Machine learning and artificial neural network
Ch.4
Linear
Regression
Ch.6
Ridge/Lasso
regression
Ch.7
Logistic
regression
Ch.8
Multi-task
regression
Ch.9
Neural
Network
Ch.10
Recurrent
NN
Ch.11
Convolutional
NN
D D D
x(t)
ŷ(t)
h(t)
h(t-1)
Layer 2
Layer 1
62. Ch.4 Regression
Topics
1. Linear regression
2. Vector-matrix representation of linear regression
3. Linear prediction
4. Non-linear regression and overfit
5. Performance evaluation: cross-validation
2019-09-26 62Machine learning and artificial neural network
63. Regression
Elements of regression problem
Prediction model ( ) with parameters ( )
Data (observations and their target values )
Cost(loss)/Objective function to minimize/maximize
Algorithm to efficiently obtain the optimal or a good
solution
2019-09-26 63Machine learning and artificial neural network
Data: 𝒊 𝟏
𝑵
Model with parameters:
Cost/loss:
Algorithm to solve ∗
𝜽
64. Regression: Linear regression
A simple example of linear regression
Data: where
Model: where parameter
Problem is to find the best for given
Best in what sense ?
2019-09-26 64Machine learning and artificial neural network
x
y
(xi, yi)
65. Regression: Linear regression
Least squares solution (최소제곱법)
We want to minimize the residual sum of squares (RSS)
Define error:
Minimize:
;𝜽
where, is a quadratic (convex) function of and
Can use 𝜽 to find and in terms of
2019-09-26 65Machine learning and artificial neural network
66. Regression: Linear regression
Generalization to multi-variate data
Data: where ,
Model:
where parameter
Cost function: Residual sum of squares (RSS)
where
;𝜽
Problem is to find ∗
𝜽∈ℝ
2019-09-26 66Machine learning and artificial neural network
67. Regression: Model structure
Model and its training at a glance
2019-09-26 67Machine learning and artificial neural network
68. Regression: Linear regression
Solution
is a quadratic function of ’s (convex function)
Can use 𝜽 to obtain a system of equations
Then, solve the system of equations to get ∗
:
:
Equivalently, in vector-matrix form, 𝑿 𝑿 𝑿 𝒚
where
𝑻
, 𝑿 𝑿 , 𝑿 𝒚
2019-09-26 68Machine learning and artificial neural network
70. Regression: Vector matrix notation
Vector-matrix notation
Problem is to find the solution of 𝜽 , which is
𝜽 𝑿 𝑿 𝑿 𝒚
𝑿 𝑿 𝑿 𝒚
Solution: ∗
𝑿 𝑿 𝑿 𝒚
Unique solution exists only if 𝑿 𝑿 is invertible!
2019-09-26 70Machine learning and artificial neural network
71. Regression: Linear regression example
Example
We want to estimate English score using two models
영어점수 수학점수
영어점수 수학점수 과학점수
Find ( , ) and ( , , ), respectively. (you may use the
results in 문제 2.2)
Homework: finish problem 4.1 and 4.2
2019-09-26 71Machine learning and artificial neural network
sid 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Ave. SD.
English 77 81 74 89 78 77 75 84 79 82 69 76 74 67 74 71 67 67 70 69 75 6.09
Math 72 67 74 64 71 67 72 68 75 70 83 78 76 82 80 77 83 84 80 80 75 6.09
Science 75 68 72 68 74 68 75 73 79 72 78 75 72 74 77 70 75 79 77 76 74 3.47
72. Regression: Linear prediction
Linear prediction
Given time series data
Use p previous samples to predict the next sample, i.e., we
want to predict using ( )
Model: ( )
Example 4.3
2019-09-26 72Machine learning and artificial neural network
𝑡 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
𝑥 -4 -3 14 8 1 -5 -7 -4 -2 6 10 22 15 -15 -20 ?
73. Regression: Linear prediction
Linear prediction
Target value:
Data matrix:
Model: ( ) (no intercept)
:
Solution: ∗
𝜽
𝑿𝑿
𝟏
𝑿𝒚
Prediction: ∗ ∗ ( )
Note: 𝑿𝑿 is a Toeplitz matrix
2019-09-26 73Machine learning and artificial neural network
74. Regression: Linear prediction
Homework: Example 4.3
1) 예측 차수 에 대해 와 를 나타내고 𝑿𝑿와 𝑿𝒚를 구하라.
에 대해 선형 예측기 파라미터 ∗
를 구하고 를 예측해 보아라.
3) 평균 제곱 오차 ( )
를 구하라. (N=14)
에 대해 (1)~(3)을 반복하라.
5) 시계열 데이터의 분산 를 구하고 (여기서
), 에 대해 를 구하라.
6) (5)의 결과에 대해 간략히 비교 설명하라.
2019-09-26 74Machine learning and artificial neural network
𝑡 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
𝑥 -4 -3 14 8 1 -5 -7 -4 -2 6 10 22 15 -15 -20 ?
75. Regression: Non-linear model and overfit
Example of Non-linear regression
Two-feature data
Non-linear model:
where
Defining , RSS cost gives us ∗
𝑿 𝑿 𝑿 𝒚
Note
The model is non-linear in ’s, but linear in ’s
RSS cost function gives us a linear system of equations
2019-09-26 75Machine learning and artificial neural network
76. Regression: Non-linear model and overfit
Considerations for non-linear regression
If the model is non-linear function of ’s, the problem
(finding solution) become complicated.
Non-linear model is subject to overfit (large generalization
error), especially when the number of samples is relatively
small compared to the number of parameters in the model.
We need to check if the model is overfitted to data or not.
2019-09-26 76Machine learning and artificial neural network
출처: https://slideplayer.com/slide/6825533/
77. Regression: Non-linear model and overfit
Overfit, underfit and just(appropriate) fit
2019-09-26 77Machine learning and artificial neural network
source: https://slideplayer.com/slide/6825533/
source : https://towardsdatascience.com/underfitting-and-overfitting-in-machine-learning-and-how-to-deal-with-it-6fe4a8a49dbf
78. Regression: Non-linear model and overfit
How to check if the model is overfitted or not
If the model is overfitted, generalization error is much (?)
larger than the minimized cost for training data, i.e.,
∗ ∗
where ∗ was obtained based on
That’s why we divide data (samples) for training and test
for performance evaluation
More systematic approach to test overfit: cross validation
2019-09-26 78Machine learning and artificial neural network
79. Regression: Non-linear model and overfit
L-fold cross-validation (교차 검증)
1. We divide the entire data (of N samples) into L groups (of
N/L samples per group)
2. Select one group for test and use all others for training
3. Measure ∗ and ∗
4. Repeat 2 and 3 for each group and take average on both
measures
5. Check if ∗ ∗
2019-09-26 79Machine learning and artificial neural network
84. Regularization
Recall linear regression
Data: where ,
Model:
where ,
Cost function: Residual sum of squares (RSS)
where
𝑿 𝑿 𝑿 𝒚
Problem is to find the solution of 𝜽 , which is
𝜽 𝑿 𝑿 𝑿 𝒚 ∗
𝑿 𝑿 𝑿 𝒚
Unique solution exists if 𝑿 𝑿 is invertible! What if it is NOT?
2019-09-26 84Machine learning and artificial neural network
85. Regularization
In what case, is NOT invertible?
It is not if N < M, i.e., when the number of samples less
than the number of features (e.g., as in bioinformatics,
medical application)
Infinite number of solutions exist
The model parameters and performance can be highly
variable with a small changes in data (overfit)
Two possible approaches
Increasing sample size (noise injection)
Reducing feature dimension (selecting good features)
2019-09-26 85Machine learning and artificial neural network
86. Regularization
Increasing sample size (noise injection)
One can double the number of samples by generating new
set of data where is random noise matrix with
covariance , i.e., ,
Then, use as new data
Note that 𝑿 𝑿 𝑿 𝑿 , which is now
invertible “anyway” if 2N > M
It is effectively a “noise injection”
generalization error can be reduced to some extent
If needed, one can add more with different random noise.
The noise variance must be chosen carefully.
Note: the distribution of may not model well the
true distribution of .
2019-09-26 86Machine learning and artificial neural network
87. Regularization
Reducing feature dimension (selecting features)
One can select M’ (<N) features, for example, having
highest covariance with target value y.
However, this does not guarantee a better performance.
An efficient feature selection method (LASSO) will be
discussed shortly
2019-09-26 87Machine learning and artificial neural network
88. Regularization: Ridge and LASSO
Ridge and LASSO regression: RSS + L1/L2 Penalty
Lp-norm:
/
controls the relative weight between RSS and penalty
Elastic net: RSS + L1 + L2 Penalty
2019-09-26 88Machine learning and artificial neural network
89. Regularization: What is the impact of penalty?
Ridge regression
Ridge regression is simply a QP problem
And the solution is ∗
𝑿 𝑿 𝑿 𝒚
𝑿 𝑿 is invertible with , even if 𝑿 𝑿 is not. (문제6.3 참고)
It is effectively a “noise injection” (an increase of sample size)
and generalization error can be reduced to some extent
2019-09-26 89Machine learning and artificial neural network
90. Regularization: What is the impact of penalty?
LASSO regression
LASSO stands for Least
Absolute Shrinkage and
Selection Operator
It tends to select features
that describe well the
target value, y
some ’s vanish if the
corresponding features
doesn’t have strong
correlation to y
LASSO effectively reduce M,
rather than to increase N.
2019-09-26 90Machine learning and artificial neural network
91. Regularization: What is the impact of penalty?
Further remarks on LASSO regression
controls sparsity (high selects less features)
LASSO tends to select one feature from a group of highly
correlated variables (features) and ignore the rest.
Unlike , L1-penaty is not differentiable at
LASSO regression is convex optimization problem, while it is
NOT a simple QP problem
use iterative algorithm to find the solution, especially when M>N
(Coordinate descent algorithm to be discussed next)
See textbook for coordinate descent algorithm for LASSO
2019-09-26 91Machine learning and artificial neural network
92. Regularization: Elastic-net
Elastic-net
Elastic-net combines L1 and L2 penalty
L1-penalty selects features (generating sparse model)
L2-penalty reduces generalization error and also encourage
grouping effects.
2019-09-26 92Machine learning and artificial neural network
Homework & Computer Lab.
Homework: 6.2, 6.3
Computer lab: ML_practice1_regression_ex_190820.ipynb
93. Machine Learning and Neural Network
Appendix C: Optimization III
Seokhyun Yoon, Electronics Eng., Dankook Uinversity
94. Appendix: Optimization
Topics
1. Optimization I: Unconstrained optimization
Definition of optimization problem
Quadratic programming problem
Maximum likelihood estimation as an optimization problem
2. Optimization II: Iterative solutions
Gradient descent and stochastic gradient descent
Coordinate descent
Newton-Raphson method
3. Optimization III: Constrained optimization
Definition
Lagrange multiplier and Rayleigh quotient optimization
Duality in constrained optimization and KKT condition
2019-09-26 94Machine learning and artificial neural network
95. Unconstrained optimization
Definitions of unconstrained optimization
Minimization:
𝜽∈ℝ
or ∗
𝜽
Minimization:
𝜽∈ℝ
or ∗
𝜽
where is a cost/objective function.
Convex optimization
If is a convex function, the solution can be obtained by
solving (as there is only one minimum (maximum)
𝜽
Sometimes, however, one cannot get closed form solution.
What can we do, then ?
2019-09-26 95Machine learning and artificial neural network
96. Iterative search for minimum/maximum
One idea: gradient search
Gradient descent
Hill climbing
Steps
Given cost function J()
Initialize n = 0, (n) = 0
Loop (epoch):
1. Compute gradient at the current
position, 𝜽 𝜽 𝜽( )
2. Update param., ( ) ( )
3. n n+1
4. Repeat 1~3 until convergence
2019-09-26 96Machine learning and artificial neural network
𝜂: Learning rate, 0 < 𝜂 ≪ 1
Small enough 𝜂 ensures that
Large 𝜂: Fast convergence, but
high MSE due to bouncing
Small 𝜂: Slow convergence, while
lower MSE
𝐽 𝜽 ≥ 𝐽 𝜽
97. Iterative search for minimum/maximum
Stochastic gradient descent (SDG)
Cost is typically a sum of per-sample cost
Update for every sample
Steps
Initialize = 0
Outer Loop (epoch): for n = 1,2,…
• Inner loop: for i = 1,2,…,N (number of samples)
( ) ( )
𝜽
• Repeat inner loop until convergence
2019-09-26 97Machine learning and artificial neural network
98. Iterative search for minimum/maximum
In linear regression
𝜽 (gradient of per-sample cost)
SGD for linear regression
Initialize = 0, n=0
Outer Loop (epoch): for n = 1,2,…
• Inner loop: for i = 1,2,…,N (number of samples)
𝑒
( )
= 𝑦 − 𝒙 𝜽( )
( ) ( ) ( )
• Repeat inner loop until convergence
2019-09-26 98Machine learning and artificial neural network
99. Iterative search for minimum/maximum
Using momentum
In SGD, if each sample contains “noise”, it disturbs the
algorithm, i.e., parameter may move to incorrect direction
It can be alleviated using momentum
( ) ( )
𝜽
( ) ( ) ( )
where
2019-09-26 99Machine learning and artificial neural network
100. Iterative search for minimum/maximum
Coordinate descent
Rather than to update every parameters at a time
Update parameters one by one (one coordinate at a time)
𝜃 = argmin 𝐽 𝜃 , 𝜃 , … , 𝜃 , 𝜃 , 𝜃 , … , 𝜃
is given by the solution of equation
𝐽(𝜽)
𝜽𝒌 [ , ,…, , ,…, ]
= 0
Simpler implementation
𝜃 = argmin 𝐽 𝜃 , 𝜃 , … , 𝜃 , 𝜃 , 𝜃 , … , 𝜃
2019-09-26 100Machine learning and artificial neural network
101. Iterative search for minimum/maximum
Coordinate descent for linear regression
Cost: 𝟐
With , and , , we have
, , ,
Update rule:
( )
,
,
( )
,
Homework: C.1
2019-09-26 101Machine learning and artificial neural network
103. Ch.6 Classification: problem formulation
Topics
1. Bayesian approach
2. Bayesian approach under Gaussian assumption
Decision boundary
3. Linear model as a special case
2019-09-26 103Machine learning and artificial neural network
104. Classification: Problem formulation
Data
Data: where ,
where is a set of categories(classes)
’s are categorical and discrete
Bayesian approach: probabilistic model
Assume each class (kth class) is distributed ~ p(x|Hk).
Given new data x, decide its class y as
∈
i.e., select class index for which the conditional
probability of x is maximum
2019-09-26 104Machine learning and artificial neural network
105. Classification: Bayesian approach
Binary classification
Assume binary classification (for simplicity), i.e.,
Given new data x, decide its class y by comparing log-
likelihood
Binary classification under Gaussian assumption
Assume with parameter and .
Then, we have
𝑻 𝑻
2019-09-26 105Machine learning and artificial neural network
106. Classification: Bayesian approach
Binary classification under Gaussian assumption
Suppose that . Then, we have
𝑻 𝑻
i.e., compare (Mahalanobis) distances of x from class centers
2019-09-26 106Machine learning and artificial neural network
𝑝 𝒙|𝐻 𝑝 𝒙|𝐻
107. Classification: Decision boundary
Decision boundary
It is a “surface” where , i.e.,
𝑻 𝑻
It can be written as
𝑻 𝑻
where
(a vector)
(a scalar)
The decision boundary is given by “conic section”
which can be an hyperbola, an ellipse or a (hyper) plane
2019-09-26 107Machine learning and artificial neural network
108. Classification: Linear model
Linear model for binary classification
Suppose further that .
Then, the decision boundary becomes
𝑻 𝑻 𝑻
which is a (hyper) plane
And the decision rule becomes
𝑻
or equivalently, 𝑻
Model parameter: and (intercept)
Linear classifier partitions ( ? ) into
non-overlapping areas using ( ? )
2019-09-26 108Machine learning and artificial neural network
109. Classification: Linear model vs. Bayesian approach
Bayesian classifier versus linear classifier
2019-09-26 109Machine learning and artificial neural network
𝑝 𝒙|𝐻 𝑝 𝒙|𝐻
𝜽 𝑻
𝒙 + 𝜃 = 0
110. Classification: Summary
Binary classification: summary
Bayesian approach
Under Gaussian assumption (with )
𝑻 𝑻
With , we get linear model
𝑻
or equivalently, 𝑻
2019-09-26 110Machine learning and artificial neural network
Our main focus is
on this linear model
111. Classification: Naive implementation
Naive implementation
Given data: where ,
is a set of categories (classes)
’s are categorical and discrete, e.g.,
Divide data into and (for each class)
Compute for
Use ’s for classification
This is not our focus, though.
2019-09-26 111Machine learning and artificial neural network
112. Classification: Roadmap
Based on the model,
Ch.7: We will develop training (learning) rule, where we
obtain and directly from data by solving an
optimization problem
Ch.8: The linear model will be extended for multinomial
classification problem
Ch.9: The model will be further extended to get neural
network model
:
2019-09-26 112Machine learning and artificial neural network
115. Roadmap
2019-09-26 115Machine learning and artificial neural network
Ch.4
Linear
Regression
Ch.6
Ridge/Lasso
regression
Ch.7
Logistic
regression
Ch.8
Multi-task
regression
Ch.9
Neural
Network
Ch.10
Recurrent
NN
Ch.11
Convolutional
NN
D D D
x(t)
ŷ(t)
h(t)
h(t-1)
Layer 2
Layer 1
116. Ch.7 Logistic Regression for binary classification
Topics
1. Logistic regression:
Model with logistic sigmoid function
2. Parameter optimization:
Likelihood function as an objective function
Application of gradient search algorithm
3. Performance measures of binary classifier
Confusion matrix, True Positive and False negative
Accuracy, Sensitivity, Specificity
ROC and AUC
2019-09-26 116Machine learning and artificial neural network
117. Logistic regression: Model
Recall (generalized) linear model for binary
classification,
It is a linear regressor if
It is a linear classifier if
It is a logistic regressor if
( )
2019-09-26 117Machine learning and artificial neural network
118. Logistic regression: Model
Interpretation of logistic regression model
, where
can be regarded as , so that
( 𝜽 𝑻 𝒙)
( 𝜽 𝑻 𝒙) (𝜽 𝑻 𝒙)
( )
( 𝜽 𝑻 𝒙) (( )𝜽 𝑻 𝒙)
where
can also be interpreted as “class estimate”.
In both case, if , is likely to be class 1. Otherwise
class 0.
𝑻
is called “odds” of being class 1. (Note: 𝑻
).
2019-09-26 118Machine learning and artificial neural network
120. Logistic regression: Cost function
Cost function: Negative log-likelihood
𝑻
can be interpreted as probability (likelihood)
that belongs to class 1.
Likelihood that belongs to the target class is
given by
Log-likelihood as an “objective” to maximize
Can also be formulated as minimization of
2019-09-26 120Machine learning and artificial neural network
121. Logistic regression
Elements of regression/classification problem
Data (observations and their target values )
Prediction model ( ) with parameters ( )
Cost(loss)/Objective function to minimize/maximize
Algorithm to efficiently obtain the optimal or a good
solution
2019-09-26 121Machine learning and artificial neural network
Data: 𝒊 𝟏
𝑵
Model with parameters: 𝑻
( 𝜽 𝑻 𝒙 )
Cost/loss:
Algorithm to min/maximize: Gradient descent
122. Logistic regression: Optimization
Optimization
contains non-linear function ( )
.
𝜽
isn’t a simple QP problem.
We resort gradient search to get optimal (or a good) solution.
To perform gradient search, we need gradient of the cost,
which is given by (see textbook p.68)
𝜽
Algorithm (pseudo code)
Initialize ( )
( ) ( ) ( )
for .
2019-09-26 122Machine learning and artificial neural network
“+” means hill-climbing
123. Logistic regression: Another cost function
Another cost: Residual sum of square (RSS)
𝑻
can also be interpreted as class estimate.
Define estimation error:
RSS as a cost to minimize
Gradient (see textbook p.68)
𝜽
Gradient descent
( ) ( ) ( )
for .
What’s difference from likelihood based optimization?
2019-09-26 123Machine learning and artificial neural network
“-” means gradient descent
124. Performance measures of binary classifier
Confusion matrix
2019-09-26 124Machine learning and artificial neural network
Why do we need other
measures than accuracy?
In some application, FN (FP)
causes more serious problem
than FP (FN)
E.g., in medical application, you
want to make decision if a
person has tumor (P) or not (N).
It isn’t a big problem if a normal
person (without tumor) is
decided to have tumor (FP). But,
the opposite case (a person with
tumor decided as normal, FN)
may cause serious problem.
You may want to minimize FPR
requiring TPR no less than a
certain threshold.
125. Performance measures of binary classifier
ROC and AUC
ROC: Receiver operating characteristic
AUC: Area under (the ROC) curve
2019-09-26 125Machine learning and artificial neural network
1
1
0 FPR
= FP/(FP+TN)
TPR
=TP/(TP+FN)
AUC (면적)
결정 경계 에
따른 성능
변화
Positive(1)
Negative(0)
TP, FP down
TN, FN up
TP, FP up
TN, FN down
127. Machine Learning and Neural Network
Ch.8: Multi-task regression
and multinomial classification
Seokhyun Yoon, Electronics Eng., Dankook Uinversity
128. Roadmap
2019-09-26 128Machine learning and artificial neural network
Ch.4
Linear
Regression
Ch.6
Ridge/Lasso
regression
Ch.7
Logistic
regression
Ch.8
Multi-task
regression
Ch.9
Neural
Network
Ch.10
Recurrent
NN
Ch.11
Convolutional
NN
D D D
x(t)
ŷ(t)
h(t)
h(t-1)
Layer 2
Layer 1
129. Ch.8 Multiclass classification
Topics
1. Multi-task regression
2. multinomial classification
3. Generalized linear model
2019-09-26 129Machine learning and artificial neural network
130. Multi-task linear regression
Linear regression with vector target
Data: where ,
where : KxN matrix with each column being
Linear model: 𝑻
where : Kx(M+1) matrix (including intercept)
Define
( ): th row of . ( : th column of )
: th column of
Cost function (RSS)
𝑻
( ) ( )
2019-09-26 130Machine learning and artificial neural network
131. Multi-task linear regression
Linear regression with vector target
Cost function is a sum of RSS for each target value ( )
( )
Optimization can be performed separately for each target
value, i.e.,
𝚯 𝜽
( )
where 𝜽
( ) gives 𝑿 𝑿 𝑿 𝒚( )
And 𝚯 gives 𝑿 𝑿 𝑿 𝒀
Can be implemented using K parallel linear regressors with
scalar target value
2019-09-26 131Machine learning and artificial neural network
132. Multi-task linear regression
Linear regression with vector target
Can be implemented using K parallel linear regressors with
scalar target value
Alternative expression of cost function
𝑻 𝑻
2019-09-26 132Machine learning and artificial neural network
133. Multinomial classification: two approaches
multinomial classification can be implemented using
multiple binary classifiers.
Two approaches (K class case)
One against the rest:
we use K binary classifiers, one for each class.
Each classifier (kth classifier) compute, for example, the
likelihood
( )
of input x belonging to the kth class.
Decide having the highest likelihood
Pairwise binary classification + majority voting:
we use K(K-1)/2 binary classifiers for each pair of classes.
Decide class by taking majority of the winners.
2019-09-26 133Machine learning and artificial neural network
134. Multinomial logistic regression
Data
Data: where ,
where is a set of categories (classes)
’s are categorical and discrete
Considerations
(single-task) logistic regressor using (integer) as target
value will not work well (because ’s are categorical, while
single-task regressor regards ’s as numerical.)
One approach is to encode ’s to a binary vector (of size
Kx1) and use multi-task logistic regressor
2019-09-26 134Machine learning and artificial neural network
135. Multinomial logistic regression
Model
Softmax function on top of multi-task linear regressor
Multi-task linear regressor
for
(odds of belonging to class )
Or, collectively,
softmax function
∑
(likelihood of belonging to class )
Note that and
2019-09-26 135Machine learning and artificial neural network
136. Multinomial logistic regression
Cost/objective
can be interpreted as Pr{ belongs to class }
Log-likelihood can be used as the objective to maximize.
Gradient:
𝜽
where
Gradient search:
( ) ( )
𝜣 𝜣 𝜣( ) for .
2019-09-26 136Machine learning and artificial neural network
Since 0 ≤ 𝑆 (𝜣 𝒙 ) ≤ 1,
the direction of gradient is
either 𝒙 for 𝑘 = 𝑦 or −𝒙 for 𝑘 ≠ 𝑦
137. Multinomial logistic regression: more issues
One hot encoding
One hot encoding is a mapping of an integer
to a binary vector .., such that ,
i.e., only one element of is 1 and all others are 0.
Example: ,
By encoding all the target values , , .., , ,
we have
is a KxN matrix with each column being
Then, the gradient is given by
𝜽 ,
2019-09-26 137Machine learning and artificial neural network
138. Multinomial logistic regression : more issues
Cross-entropy
With one hot encoding: , , .., ,
is the probability mass of
Posterior likelihood of : , , ,
with ,
The cross entropy between and is given by
,
We call “cross-entropy cost”.
2019-09-26 138Machine learning and artificial neural network
139. Multinomial logistic regression : more issues
Multi-task logistic regressor
Using one hot encoding, one can
replace (for simplicity) the softmax
function with K separate logistic
sigmoid function
K parallel logistic regressors.
Performance ?
2019-09-26 139Machine learning and artificial neural network
s(o1) s(oK)s(o2)
1
2
K
x0 x1 x2 xM
o1 o2 oK
̂p1 ̂p2 ̂pK
Other remarks
Multinomial logistic regression is one-against the rest
approach.
Once the likelihoods ’s are obtained, the class estimate is
determined by
140. Multinomial logistic regression: generalization
Generalized linear model
Linear regression and logistic
regressions can be represented by
one structure
Consisting of an “activation
function” on top of multi-
task linear regressor
The output can be interpreted
in various ways (e.g., as likelihoods
or as estimates of target value)
2019-09-26 140Machine learning and artificial neural network
Also, there are many options for activation function (e.g.,
linear, sigmoid or tanh)
If input is categorical, apply one hot encoding before
feed to regressor (input dimension must be changed too)
141. Multinomial logistic regression: generalization
Generalized linear model
Regularization can also be applied if desired by defining
cost with penalty
𝐅
𝟐
where
For linear regression:
For logistic regression:
basically regards the input and output as numerical. So,
if you deal with categorical values, you need apply one
hot encoding first.
2019-09-26 141Machine learning and artificial neural network
144. Ch.9 Artificial neural network
Topics
1. Perceptron and artificial neural network (NN)
2. Neural network model
3. Training NN: backpropagation
4. Some issues on NN
Convergence to local minima
Overfitting
Vanishing gradient problem
5. Practical considerations (building and training NN)
2019-09-26 144Machine learning and artificial neural network
145. Roadmap
2019-09-26 145Machine learning and artificial neural network
Ch.4
Linear
Regression
Ch.6
Ridge/Lasso
regression
Ch.7
Logistic
regression
Ch.8
Multi-task
regression
Ch.9
Neural
Network
Ch.10
Recurrent
NN
Ch.11
Convolutional
NN
D D D
x(t)
ŷ(t)
h(t)
h(t-1)
Layer 2
Layer 1
146. ANN: Perceptron
Perceptron
It is an array of neurons interconnected, exactly the
same as in generalized linear models
It was suggested mimicking biological neuron
2019-09-26 146Machine learning and artificial neural network
Biological neuron
source: https://en.wikipedia.org/wiki/Biological_neuron_model
Regression model
(Artificial neuron)
f(net)
Neuron
Input nodes
(dendrites)
Output nodes
(axon terminal)
Activation
function
(synaptic)
Weights
x0 x1 x2 xM
0 1 2 M
net T
x
y
147. ANN: Perceptron
Perceptron
Multi-task regression model is an horizontal array of
artificial neurons, with either combined activation or
separate activation
2019-09-26 147Machine learning and artificial neural network
̂y1
f(o1, o2,…, oK)
̂y2 ̂yK
1
2
K
x0 x1 x2 xM
o1 o2 oK
with combined activation
s(o1) s(oK)s(o2)
1
2
K
x0 x1 x2 xM
o1 o2 oK
̂p1 ̂p2 ̂pK
with separate activation
148. ANN: Multi-layer Perceptron
Multi-layer Perceptron
Consists of multiple layers of
multi-task regressors vertically
stacked
Output of one layer is fed to the
input of the next layer.
Number of layers and number of
neurons per layer can be
arbitrarily set
Non-linear activation function
make it different from single-
layer (linear) model, i.e., it makes
the model non-linear
Can be used for regression and
classification
2019-09-26 148Machine learning and artificial neural network
149. ANN: Multi-layer Perceptron
Operations
Feedforward (prediction phase):
For given input and the
current parameter , it
produce an output
Feedback (training phase): For
each input and target vector
, the parameter ’s are
updated
Gradient search is used for
some optimality criteria
2019-09-26 149Machine learning and artificial neural network
150. ANN: Multi-layer Perceptron
Structure definition
Number of layers:
Number of neurons per layer:
Full connection assumed
Signals and parameters
Input:
Target vector:
Weight matrix: ( )
Hidden layer output:
( )
Final output: ( )
2019-09-26 150Machine learning and artificial neural network
151. ANN: Multi-layer Perceptron
Feedforward (prediction)
From to
1) ( ) ( ) ( )
( ( )
)
2) ( ) ( ) ( ( ) )
More simply, ( ) ( ) ( )
( ) is 1-augmented version of ( )
( ) is matrix
including “intercept”
activation function is applied
to each element of ( )
2019-09-26 151Machine learning and artificial neural network
152. ANN: Multi-layer Perceptron
Feedback (training)
Assume training is performed in per-sample basis, i.e., SGD
Cost function (RSS):
( , ,…, ) ( ) ( ) ( )
Cross-entropy can also be used as cost (not covered here)
To train the model, we need 𝑾( ) for
Top layer is easy: 𝑾( ) ( ) ( )
( )
( ) ,
where ( )
( ) ( )
and
( )
( )
( )
Layer below ? We need to apply chain rule
The problem, however, is not as simple as you expect.
See textbook, section 9.3
2019-09-26 152Machine learning and artificial neural network
153. ANN: Multi-layer Perceptron
Feedback (training)
The training starts from top layer and run through
downward, one by one.
Training: From to :
( ) ( ) ( )
with ( )
𝑾( )
( , ,…, )
where, by applying chain rule (see textbook p.81-82)
( ) ( ) ( )
( ) ( ) ( )
( )
We call it “backpropagation (BP)” as it is performed
backward (downward), opposite to feedforward operation.
2019-09-26 153Machine learning and artificial neural network
155. ANN: Multi-layer Perceptron
Activation function
Except for the top (output) layer, activation function
should be non-linear for a hidden layer to be effective.
Any monotonically increasing function can be used.
They are typically s-shaped, e.g., logistic sigmoid or tanh
ReLU or leaky ReLU are widely used recently.
ReLU:
Leaky ReLU: with
2019-09-26 155Machine learning and artificial neural network
156. ANN Issues: Convergence to local minima
Convergence to local minima
NN is a non-linear model and the cost J is not convex.
The number of minima/maxima is not known
Gradient search does not guarantee the convergence to
the global minimum
The local minima we get depends on the initial setting of W
There are no systematic approaches to achieve global
minimum yet.
Simulated annealing, Genetic algorithms were proposed as
heuristic solutions
2019-09-26 156Machine learning and artificial neural network
157. ANN Issues: Overfitting
Overfitting
NN model has so many parameters (W(1),W(2),…,W(L))
Deep NN is especially the case
Similar to linear model, where N << M, NN with too much
parameters may be easily overfitted to the training data
Three approaches to relieve overfitting
Noise injection: Increasing the number of data by adding
noise reduce generalization error (to some extent)
Regularization technique: add L1/L2 penalty to the cost
function similar impact to noise injection
Dropout ?
2019-09-26 157Machine learning and artificial neural network
158. ANN Issues: Overfitting
Dropout: avoiding co-adaptation of neurons
Useful for Convolutional NN (for image)
At each training phase (for a batch of samples), we
randomly select a portion of neurons (with probability p)
and disable them
Can avoid many neurons co-adapted to each other (avoid
many neurons activated to similar data)
Many NN packages support dropout layer as an option
2019-09-26 158Machine learning and artificial neural network
159. ANN Issues: Vanishing gradient
Vanishing gradient problem
This is also a typical problem in deep neural network.
BP (training) starts from top layer and run through
downward one-by-one, recursively.
Recall: ( )
, where ( ) ( ) ( )
where
With sigmoid function, (it’s mostly close to 0)
’s are computed recursively
As BP run through downward, gets smaller and smaller,
and so does ( )
vanishing gradient
If NN has many layers, effective learning rate in bottom
layers gets very small, i.e., neurons in bottom layers are
hardly trained take to much time to be trained
2019-09-26 159Machine learning and artificial neural network
160. ANN Issues: Vanishing gradient
Vanishing gradient problem
Using ReLU or leaky ReLU may help alleviate vanishing
gradient problem.
Unsupervised learning based pre-training of bottom layers
was proposed, though not so widely used recently.
2019-09-26 160Machine learning and artificial neural network
161. ANN Issues: Building NN model
To build a neural network model, you need to
consider first
Input and output dimension?
How many layers? ( )
How many neurons for each layer? ( )
Activation function ? (sigmoid, tanh, ReLU or leaky ReLU)
Dropout layer? With what probability? (p)
What cost function ? (RSS or cross-entropy)
Which optimizer to use? (simple SGD w/wo momentum .. )
Batch size?
Regression or classification ? (For regression, top layer
activation is typically set linear)
2019-09-26 161Machine learning and artificial neural network
162. ANN Issues: Training NN model
When training NN, you need to check
Overfitting (compare performance with training and test
data while training the model)
Vanishing gradient (check if training takes too much time)
Convergence to bad local minima (you can train many times
or train multiple instances in parallel with different initial
values)
2019-09-26 162Machine learning and artificial neural network
Computer Lab.
Practice: ML_practice3_NN_ex.ipynb
164. Ch.10 Recurrent neural network
Topics
1. Model structure and operation.
2. RNN Training: backpropagation through time (BPTT)
3. LSTM (long/short term memory)
2019-09-26 164Machine learning and artificial neural network
165. Roadmap
2019-09-26 165Machine learning and artificial neural network
Ch.4
Linear
Regression
Ch.6
Ridge/Lasso
regression
Ch.7
Logistic
regression
Ch.8
Multi-task
regression
Ch.9
Neural
Network
Ch.10
Recurrent
NN
Ch.11
Convolutional
NN
D D D
x(t)
ŷ(t)
h(t)
h(t-1)
Layer 2
Layer 1
166. RNN: Recurrent neural network
Features
Recurrence means output fed
back to input
Necessarily, the input is a
time-series data
Example on the right consist of
two layers
The hidden layer output is fed
back to input with one sample
delay (D)
2019-09-26 166Machine learning and artificial neural network
Layer 2 has no feedback loop (conventional NN layer)
Main applications are speech recognition, language
modelling (machine translation, sentence completion),
where data is given as time series
𝒉( )
= 𝑓 𝑼𝒙( )
+ 𝑽𝒉( )
𝒚( )
= 𝑓 𝑾𝒉( )
167. RNN: Recurrent neural network
Model
Consider 1-layer RNN for simplicity
Input: ( )
(time-series)
Output (state): ( )
(time-series)
Feedforward operation:
( ) ( ) ( )
Output depends on both ( ) and previous output (state) ( )
Feedforward operation can also be expressed as
( ) ( ) ( )
Initial condition: Asuume ( )
( ) ( )
2019-09-26 167Machine learning and artificial neural network
f(·)
h(t)
x(t)
h(t-1)
U
V
(a) RNN with a loop
D
g(t)
168. RNN: Recurrent neural network
Unfolded model
2019-09-26 168Machine learning and artificial neural network
f(·)
h(t)
x(t)
h(t-1)
U
V
(a) RNN with a loop
D
g(t)
170. RNN: Training
RNN Training (textbook 10.2)
Gradient of (at time )
𝑼
( )
𝑼
where
( )
𝑼
( )
𝒈( )
𝒈( )
𝒈( )
𝒈( )
𝑼
In the same way as for the gradient w.r.t. , we have
( )
𝑼
( ) ( ) ( )
To update and , we need perform BP through time
(from to ).
We call it backpropagation through time (BPTT).
2019-09-26 170Machine learning and artificial neural network
171. RNN: Training
Vanishing and exploding gradient
Looking at
𝑼
(and also
𝑽
), the gradient contains
( )
where for any activation function we considered,
( ) (matrix norm)
Assuming , we have
( )
(mostly , why?)
As , l.h.s. goes to 0 if (vanishing gradient) or
to if ( )
/( )
(exploding gradient)
The latter seldom occurs.
2019-09-26 171Machine learning and artificial neural network
172. RNN: Training
Forgets past inputs/outputs quickly
We also have for
𝑼
( ) ( ) ( ) ( ) ( )
RNN is supposed to memorize the past inputs (in the system
state) to deal with time-series data.
With , however, as gets large.
This means the system forgets past inputs quickly.
There are many examples where we need long term memory
to correctly catch what exactly the sentence means.
2019-09-26 172Machine learning and artificial neural network
173. RNN: Training
RNN summary
Due to recurrence nature, RNN training requires
backpropagation through time (to t=1)
If T gets large, the gradient may vanish or explode. the
training rule should be carefully tuned.
As in most case, vanishing gradient occurs more
frequently than exploding gradient
One solution to avoid vanishing/exploding gradient problem
is to perform BPTT only for finite length of time window.
(unfolded model of finite length)
2019-09-26 173Machine learning and artificial neural network
174. RNN: LSTM
Long/Short term memory (LSTM)
a variant of RNN (proposed in 1997) to solve (partly) the
vanishing gradient and to make system memory longer.
Vanilla RNN and LSTM
3 gates (forgetting/input/output gate) + main path
Two separate states: ( )
and (𝒕)
2019-09-26 174Machine learning and artificial neural network
176. RNN: LSTM
LSTM operation
Cell state update:
(𝒕) ( ) ( ) ( ) ( ) ( ) (long-term memory)
( ) ( ) ( )
(short-term memory, final output)
When ignoring the gating function, ( )
is simply a sum of ( )
and
the new input ( ) ( ) can keep long-term memory
( )
select important features from previous state ( ),
which comprise a part of the current output ( )
.
( )
select important features from the new input (output of
vanilla RNN), which comprise another part of the current
output)
( )
controls what features in ( )
to pass to output ( )
.
2019-09-26 176Machine learning and artificial neural network
177. RNN: LSTM
LSTM operation
Gating function:
( ) ( ) ( )
(forget gate)
( ) ( ) ( ) (input gate)
( ) ( ) ( ) (output gate)
Parameters of three gates ( , , , , , ) are
obtained through BPTT, too.
i.e., LSTM learns from the data what features to select
from ( )
(long-term memory) and from ( ) ( )
.
Also it learns what features in ( ) to pass to the final
output ( )
.
2019-09-26 177Machine learning and artificial neural network
178. RNN: Building RNN/LSTM model
Unfolded RNN/LSTM model
You can add NN layer on top of RNN/LSTM cell.
2019-09-26 178Machine learning and artificial neural network
RNN/LSTM
cell K
RNN/LSTM
cell K-1
RNN/LSTM
cell 2
RNN/LSTM
cell 1
y(t)
y(t-1)
y(t-K+1)
y(t-K)
x(t)
x(t-1)
x(t-K+1)
x(t-K)
DD D
Computer Lab.
Practice 1: ML_practice4_RNN_seq_pred.ipynb
Practice 2: ML_practice5_RNN_hihello.ipynb
180. Ch.11 Convolutional neural network
Topics
1. Features of CNN
2. CNN Model
Convolution sublayer
Activation function sublayer
Pooling sublayer
3. CNN Training
2019-09-26 180Machine learning and artificial neural network
181. Roadmap
2019-09-26 181Machine learning and artificial neural network
Ch.4
Linear
Regression
Ch.6
Ridge/Lasso
regression
Ch.7
Logistic
regression
Ch.8
Multi-task
regression
Ch.9
Neural
Network
Ch.10
Recurrent
NN
Ch.11
Convolutional
NN
D D D
x(t)
ŷ(t)
h(t)
h(t-1)
Layer 2
Layer 1
182. CNN: Convolutional neural network
Image/Vision classification and object detection
An image has 2D(matrix) or 3D(tensor) structure (i.e., RGB)
Information is contained in a pixel, an element of a matrix
(2D image) or a tensor (2D images for RGB or 2D images
captured with 2 cameras).
Nearby pixels (values) are highly correlated
patterns in an image can be identified by the correlations
between nearby pixels
nearby pixels must be processed as a chunk
Identifying patterns in an image is “translation invariant”
and “size invariant”. (we can identify same patterns
wherever it is located and whatever its size is).
Sometimes, it should also be rotation invariant.
2019-09-26 182Machine learning and artificial neural network
183. CNN: Convolutional neural network
CNN for image/vision data
CNN is a special NN designed for image/vision data.
Can be used for image classification, object detection,
depth estimation, etc.
It processes a chunk of nearby pixels simultaneously
(receptive field)
Will see how it provides object (pattern) detection with
translation invariance.
Size invariance can be provided by multi-layer structure
Rotation invariance?
2019-09-26 183Machine learning and artificial neural network
184. CNN Model
(example) configuration of CNN
Two convolution NN layers and 3 (FC) NN layers.
Convolution NN layers are divided into sublayers:
convolution sublayer (denoted by CX) and pooling sublayer
(denoted by SX)
(FC) NNs layers are C5, F6 and output (C5 looks like FC NN)
2019-09-26 184Machine learning and artificial neural network
source: Proc. of IEEE, Nov. 1998 by Y. LeCun, et.al.
185. CNN Model – Convolution layer
CNN model (convolution layer)
For convenience, we divide it into 3 sublayers.
Convolution sublayer
Activation function sublayer
Pooling sublayer
Activation function sublayer is the same as in conventional NN
Dropout can also be applied as in fully connected NN layer
2019-09-26 185Machine learning and artificial neural network
186. CNN Model – Convolution layer
CNN model (convolution layer)
Conventional NN layer has 1-dimensional array of neurons,
while CNN layer has 3-dimensional array (width, height and
depth), where depth index is called “channel”
Input to a CNN layer is also 3-dimensional, e.g., 2D images
with RGB (3 channels)
Denote 3-d input and output of th CNN layer as ( )
and
( )
, where and are channel index.
2019-09-26 186Machine learning and artificial neural network
187. CNN Model – Convolution layer
CNN model (convolution layer)
Operations of three sublayers are
Convolution sublayer ------------:
( ) ( ) ( )( )
Activation function sublayer ---:
( ) ( )
Pooling sublayer -----------------:
( ) ( )
Input and output size are the same only for AF sublayer.
Other two sublayer has different input and output size.
2019-09-26 187Machine learning and artificial neural network
188. CNN Model – Convolution layer
Convolution sublayer
( ) ( ) ( )( )
( )
is weight matrix (filter) between th channel of the
input and th channel of the output.
is 2-d convolution, with which th element of ( )
is
given by ,
( )
,
( )
,
( )
( , )∈ ( , )
is the “receptive field” of th neuron
2019-09-26 188Machine learning and artificial neural network
2-d array of neurons
of jth output channel
2-d array of input signal
of ith input channel
189. CNN Model – Convolution layer
Convolution sublayer
Each filter responses to a certain pattern within a
receptive fields on the input.
Filter examples: three filters of size 5x5 response to
different patterns (diamond, T and diagonal, respectively)
The filter coefficients are obtained through CNN training
and, in general, they are real values.
2019-09-26 189Machine learning and artificial neural network
190. CNN Model – Convolution layer
Convolution sublayer example (2 input ch., 3 output ch.)
All the neurons of a channel share the same weight matrices
A channel (2D array) is a feature map containing information
of a (combination of) specific pattern(s) defined by weight
matrices; (information on location and existence)
2019-09-26 190Machine learning and artificial neural network
191. CNN Model – Convolution layer
Convolution sublayer
Configuration parameters
• : stride, ( ) : size of the weight matrix (2-d)
• ( ), ( ): size of input and output (3-d)
, ( ) and ( ) must be set properly
The number of weight matrices (filters) to train is
In general, , , while ,
2019-09-26 191Machine learning and artificial neural network
2-d array of neurons
of jth output channel
2-d array of input signal
of ith input channel
192. CNN Model – Convolution layer
Activation function sublayer
( ) ( )
Output of convolution sublayer, ( )
, is then passed through
an activation function.
ReLU or leaky ReLU are typically used.
The output ( )
has the same size as that of the input.
2019-09-26 192Machine learning and artificial neural network
193. CNN Model – Convolution layer
Pooling sublayer
( ) ( )
Pooling sublayer down sample the sublayer input, ( )
.
While doing so, it also summarizes the data too.
Let be the down-sample ratio. Each channel of input is
partitioned into areas (pooling area), in which
array of numbers are summarized into a scalar.
Two types: max-pooing (takes maximum value) and average-
pooling (take average of values) output size is
of input size
2019-09-26 193Machine learning and artificial neural network
194. CNN Model – Convolution layer
Pooling sublayer
Pooling operation can be expressed as
,
( ) ( , )∈ ( , )
,
( )
,
( )
( , )∈ ( , )
is the pooling area of th output.
Pooling reduces computational burden, e.g., with , the
number of parameters to train is reduced by ¼.
If is too large, however, important information can be lost.
Better to apply pooling
multiple times with
small .
2019-09-26 194Machine learning and artificial neural network
195. CNN training
CNN training
The parameters to optimize is the weight matrices, 𝑾
( )
’s
for ,
Similar to conventional NN, we apply chain rule to compute
gradient w.r.t. ( )
Differences from conventional NN
1. 3-D (cubic) arrays of neurons
2. partial connection & weight sharing in conv. sublayer
3. passing gradient through pooling sublayer.
See textbook section 11.3 for detail
2019-09-26 195Machine learning and artificial neural network
196. CNN training
Improving performance of CNN
Apply dropout to avoid co-adaptation between channels
Data normalization: adjust mean (brightness) and variance
(contrast) of image to make them fall within predefined ranges
Batch normalization: normalize data for each batch at each layer
Data augmentation: increase data set by resizing and/or
rotating the original image size/rotation invariance
2019-09-26 196Machine learning and artificial neural network
198. Machine Learning and Neural Network
Ch.12/13: Unsupervised learning:
Clustering and data visualization
Seokhyun Yoon, Electronics Eng., Dankook Uinversity
199. Ch.12/13 Clustering and data visualization
Topics
1. Clustering
Partitioning (centroid) based clustering: k-means algorithm
Hierarchical (connectivity based) clustering and dendrogram
Density based clustering
Distribution based clustering
2. EM algorithm for Gaussian Mixture Model (Ch.13)
3. Data visualization using non-linear mapping: t-SNE
2019-09-26 199Machine learning and artificial neural network
200. Clustering and data visualization
Clustering
Data without label: where
Objective is to divide data into a set of groups
based on some similarity measures
Need to devise procedures to efficiently group data
Data (distribution) visualization to check clusters
Typical similarity measures:
Euclidian distance:
Correlation:
𝒙 𝒙
𝒙 𝒙
2019-09-26 200Machine learning and artificial neural network
201. Clustering and data visualization
Four approaches to clustering
Partitioning (centroid) based clustering: k-means
Hierarchical (connectivity based) clustering
Density based clustering
Distribution base clustering: Gaussian Mixture Model and
EM algorithm (ch.13)
2019-09-26 201Machine learning and artificial neural network
202. Partitioning (centroid) based Clustering: k-means
Partitioning (centroid) based clustering
2019-09-26 202Machine learning and artificial neural network
The feature space is
partitioned into Voronoi
regions, where each region
is represented by a
centroid.
Based on Euclidian distance
measure, the points in a
Voronoi region are those
closest to that centroid
k-means (Lloyd) algorithm
searches for centroids for
pre-defined number of
regions to partition.
203. Partitioning (centroid) based Clustering: k-means
K-means clustering (Lloyd’s algorithm)
Input: , : the number of clusters to find
Initialization: randomly select samples to use them as
centroids,
1) Determine class members :
Set
For all samples , do
∗
∈{ , ,…, }
∗ ∗
2) Update centroid:
| | ∈ (mean of its members)
Repeat 1) and 2) many times until doesn’t change any more
Output: , cluster label for all
2019-09-26 203Machine learning and artificial neural network
204. Partitioning (centroid) based Clustering: k-means
Partitioning (centroid) based clustering
K-means algorithm has been originally proposed for vector
quantization
The clusters found can be quite different from our
expectation, especially when the size of the true clusters
are quite different
2019-09-26 204Machine learning and artificial neural network
Source: https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm
205. Hierarchical (connectivity based) clustering
Hierarchical clustering
cluster hierarchy represented by a dendrogram (a binary
tree representing similarity between clusters).
In the tree, a node is a cluster and a leaf node is a sample
Two approaches to build dendrogram: top-down (divisive) or
bottom-up (agglomerative)
2019-09-26 205Machine learning and artificial neural network
Node
Leaf node
Root node
Samples, labelled by (BRCA) tumor category
Features
[gene name]
206. Hierarchical (connectivity based) clustering
Bottom-up (agglomerative) approach
Initially, each sample is set as a cluster (leaf node) having only
one member.
1) Compute “inter-cluster distances” for every pair of clusters
(nodes without parent).
2) Select the pair with smallest distance and merge them to one.
(add a node in the tree connecting the two nodes)
Repeat 1) and 2) until only one cluster is left
2019-09-26 206Machine learning and artificial neural network
Source:
https://www.researchgate.net/
publication/273456906_Cluster
_Analysis_to_Understand_Socio
-Ecological_Systems_A_Guideline
/figures?lo=1
207. Hierarchical (connectivity based) clustering
Bottom-up (agglomerative) approach
2019-09-26 207Machine learning and artificial neural network
Inter-cluster distances:
distance between two clusters
It can be defined as
minimum (single-linkage)
average (average-linkage)
maximum (complete-linkage)
of distances between every pair
of members (one from each
cluster)
208. Hierarchical (connectivity based) clustering
Top-down (divisive) approach
Initially, we have only one cluster having all the sample as
its members. (root node)
1) Select the cluster having highest “intra-cluster distance”
(for example)
2) Apply k-means clustering to divide it into two.
Repeat 1) and 2) until every cluster have only one member.
Another name of this is “hierarchical k-means”
2019-09-26 208Machine learning and artificial neural network
209. Density based clustering
Density based clustering
A cluster is defined as a set of samples that lie within a
relatively dense area.
Clusters are divided by sparse area.
Useful when clusters are not centralized (not radially
distributed)
Two well known algorithm: DBSCAN and OPTICS
2019-09-26 209Machine learning and artificial neural network
Source: https://untitledtblog.tistory.com/146
210. Density based clustering
Density based clustering: DBSCAN
Two parameters: (dist. threshold) and (# of points)
Definition (core point): It is a point from which there are at
least points within a distance .
First, divide all the points into core and non-core points.
Assign cluster # to core points
1) Select a core point x of which cluster is not assigned yet.
2) Find all the core points that can be connected within a distance
to each other assign a cluster # to these core point(s)
3) Repeat 1) and 2) to find all the core point clusters
Assign cluster # to non-core points
1) For all non-core points, find the closest core point within the
distance and set its cluster to the cluster # of that core point.
2) If there is no core point within , it is simply regarded as outlier.
2019-09-26 210Machine learning and artificial neural network
211. Distribution based clustering
Distribution based clustering: Mixture model
Use a PDF model (with parameters) to approximate
probability distribution of clusters
The data distribution is modelled by a mixture of the PDFs
A well-known, mathematically tractable one is Gaussian
mixture model (GMM), of which the data distribution is
modelled by
where is cluster index and is the number of clusters
The objective is to find optimal model parameters ,
for that best fit to given data set.
2019-09-26 211Machine learning and artificial neural network
212. Gaussian mixture model & EM algorithm
Gaussian mixture model
|
where is cluster index and is the number of clusters
is a latent variable (은닉 변수)
The objective is to find optimal model parameters ,
for that best fit to given data set.
Issues
May use likelihood as objective function
|
( consists of { }’s)
Not easy to maximize as contains summation due to
the latent variable
2019-09-26 212Machine learning and artificial neural network
213. Gaussian mixture model & EM algorithm
EM algorithm (in general)
Use conditional likelihood given , i.e., assume
(the cluster of each sample ) is fixed
Define conditional likelihood
|
With this, we iteratively find and
Steps
Initialize (𝟎) and do the following while not converge
1) E-step:
𝒛|𝑿,𝜽 𝒕 𝒛
𝒕
2) M-step: ( )
𝜽
( )
2019-09-26 213Machine learning and artificial neural network
214. Gaussian mixture model & EM algorithm
EM algorithm for Gaussian mixture model
Conditional likelihood:
Steps
Input:
Initialize (𝟎)
Do the following while not converge
1) E-step: ( ) 𝒙 ; 𝝁
( )
,𝑪
( )
∑ 𝒙 ; 𝝁
( )
,𝑪
( )
2) M-step:
( ) ∑
( )
∑ ∑
( )
( ) ∑
( )
𝒙
∑
( )
( ) ∑
( )
𝒙 𝝁 𝒙 𝝁
∑
( )
(See textbook section 13.2 for detail)
2019-09-26 214Machine learning and artificial neural network
215. Gaussian mixture model & EM algorithm
Clustering with GMM
Note
The number of clusters must be fixed a priori.
Variational EM can find a good number for implicitly.
See “C. M. Bishop, Pattern Recognition and Machine
Learning, Springer” for variational EM
2019-09-26 215Machine learning and artificial neural network
Source: https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm
217. Non-linear feature dimension reduction: t-SNE
Data (distribution) visualization
Data visualization gives us a lot of information on the data,
its shape of distributions, the number of separable
clusters, and so on.
One can also check if clustering is done properly and if
there is any outliers or not.
Linear dimension reduction (PCA) is effective if the number
of clusters or the original feature dimension is small
enough.
We discuss a non-linear dimension reduction technique,
t-distributed stochastic neighbor embedding (t-SNE).
2019-09-26 217Machine learning and artificial neural network
218. Non-linear feature dimension reduction: t-SNE
Requirement in general
points close to each other in the original space must also
be close together in the new (low dimensional) space.
The local structure (manifolds) in the original space is kept
in the new space with as little distortion as possible.
Characteristics of t-SNE
It’s a non-linear mapping
Direct mapping: x in the original space z in the new space,
obtained by solving an optimization problem
If some new data is added, we need to perform
optimization again and the new mapping will be different
from the previous one.
An upgraded version of SNE
2019-09-26 218Machine learning and artificial neural network
219. Non-linear feature dimension reduction: t-SNE
Elements
Pairwise similarity in the original space:
Pairwise similarity in the new space:
Cost function:
Definition
Given data points , let be the point wise
mapping of in the new space.
𝒙 𝒙 /
∑ 𝒙 𝒙 /, :
𝒛 𝒛
∑ 𝒛 𝒛, :
𝒛 𝒛
∑ 𝒛 𝒛
Both and are valid PMFs.
2019-09-26 219Machine learning and artificial neural network
220. Non-linear feature dimension reduction: t-SNE
Cost function: Kullback-Leibler divergence(KLD)
Cost = KLD between and
(𝒁), :
and hold if and are valid PMFs
Optimization
We want to find that minimize .
Apply gradient descent, for which the gradient of
w.r.t. is given by
𝒛
(𝒁)
𝒛 𝒛
More tricks were applied (see the original paper)
2019-09-26 220Machine learning and artificial neural network
221. Non-linear feature dimension reduction: t-SNE
Note
𝒛
(𝒁)
𝒛 𝒛
Let be the original space and be the new space
(direction of movement of ) = in is either toward
or the opposite
The sign is determined by , i.e., it is toward if
(similarity in < that in or distance in > that in )
The actual movement is given by the sum for all make
and as close as possible
𝒛 𝒛
can be regarded as the rate of movement
The rate of movement is large if and are close together
and vice versa try to keep focused on local structure
2019-09-26 221Machine learning and artificial neural network
222. Non-linear feature dimension reduction: t-SNE
Comparison: PCA versus t-SNE
400 dimensional features mapped to 2-dimensional features
2019-09-26 222Machine learning and artificial neural network
223. Non-linear feature dimension reduction: t-SNE
Perplexity: setting
Perplexity is defined for a point as ( )
,
where | |
with |
𝒙 𝒙 /
∑ 𝒙 𝒙 /
We make the perplexity roughly the same , i.e.,
set smaller in dense region (many points nearby)
set larger in sparse region (few points nearby)
In this way, the effective number of points nearby is
made roughly the same
Binary search can be used to find
Typical value of perplexity is 5~50
2019-09-26 223Machine learning and artificial neural network