The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. On the other hand, the complexity of the analyses required to extract useful information from these piles of data is also rapidly increasing rendering more traditional and simpler approaches simply unfeasible or unable to provide new insights.
In this tutorial we provide a practical introduction to some of the most important algorithms of machine learning that are relevant to the field of Complex Networks in general, with a particular emphasis on the analysis and modeling of empirical data. The goal is to provide the fundamental concepts necessary to make sense of the more sophisticated data analysis approaches that are currently appearing in the literature and to provide a field guide to the advantages an disadvantages of each algorithm.
In particular, we will cover unsupervised learning algorithms such as K-means, Expectation-Maximization, and supervised ones like Support Vector Machines, Neural Networks and Deep Learning. Participants are expected to have a basic understanding of calculus and linear algebra as well as working proficiency with the Python programming language.
The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. One of the main beneficiaries of the current data glut have been data intensive approaches such as Neural Networks in general and Deep Learning in particular, with many important new practical applications arising in the last couple of years.
In this tutorial we provide a practical introduction to the most important concepts of Neural Networks as we gradually implement simple neural networks from scratch to perform simple tasks such as character recognition. Our emphasis will be on understanding the architecture, training, advantages and pitfalls of practical neural networks.
Participants are expected to have a basic familiarity with calculus and linear algebra as well as working proficiency with the Python programming language.
Neural networks for word embeddings have received a lot of attention since some Googlers published word2vec in 2013. They showed that the internal state (embeddings) that the neural network learned by "reading" a large corpus of text preserved semantic relations between words.
As a result, this type of embedding started being studied in more detail and applied to more serious Natural Language Processing + NLP and IR tasks such as summarization, query expansion, etc...
In this talk we will cover the intuitions and algorithms underlying word2vec family of algorithms. On the second half of the presentation we will quickly review than basics of tensorflow and analyze in detail the tensorflow reference implementation of word2vec
From list sorting to network routing, and from hash tables to capacity planning, a programmer's daily work is filled with probability. We use probabilistic algorithms, data structures, and systems constantly often without even thinking about it. Experienced engineers reach for probabilistic algorithms frequently and intentionally, especially when building systems of serious scale. How do probabilistic algorithms actually work in practice? And how do we know they'll be safe and reliable in our critical production systems? We'll address those questions, explore a few algorithms, and see why "with high probability" is often better than "exactly".
Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. During this workshop, members of the Amazon Machine Learning team will provide a short background on Deep Learning focusing on relevant application domains and an introduction to using the powerful and scalable Deep Learning framework, MXNet. At the end of this tutorial you’ll gain hands on experience targeting a variety of applications including computer vision and recommendation engines as well as exposure to how to use preconfigured Deep Learning AMIs and CloudFormation Templates to help speed your development.
- POSTECH EECE695J, "딥러닝 기초 및 철강공정에의 활용", Week 5
- Contents: Restricted Boltzmann Machine (RBM), various activation functions, data preprocessing, regularization methods, training of a neural network
- Video: https://youtu.be/v4rGPl-8wdo
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
* Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. One of the main beneficiaries of the current data glut have been data intensive approaches such as Neural Networks in general and Deep Learning in particular, with many important new practical applications arising in the last couple of years.
In this tutorial we provide a practical introduction to the most important concepts of Neural Networks as we gradually implement simple neural networks from scratch to perform simple tasks such as character recognition. Our emphasis will be on understanding the architecture, training, advantages and pitfalls of practical neural networks.
Participants are expected to have a basic familiarity with calculus and linear algebra as well as working proficiency with the Python programming language.
Neural networks for word embeddings have received a lot of attention since some Googlers published word2vec in 2013. They showed that the internal state (embeddings) that the neural network learned by "reading" a large corpus of text preserved semantic relations between words.
As a result, this type of embedding started being studied in more detail and applied to more serious Natural Language Processing + NLP and IR tasks such as summarization, query expansion, etc...
In this talk we will cover the intuitions and algorithms underlying word2vec family of algorithms. On the second half of the presentation we will quickly review than basics of tensorflow and analyze in detail the tensorflow reference implementation of word2vec
From list sorting to network routing, and from hash tables to capacity planning, a programmer's daily work is filled with probability. We use probabilistic algorithms, data structures, and systems constantly often without even thinking about it. Experienced engineers reach for probabilistic algorithms frequently and intentionally, especially when building systems of serious scale. How do probabilistic algorithms actually work in practice? And how do we know they'll be safe and reliable in our critical production systems? We'll address those questions, explore a few algorithms, and see why "with high probability" is often better than "exactly".
Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. During this workshop, members of the Amazon Machine Learning team will provide a short background on Deep Learning focusing on relevant application domains and an introduction to using the powerful and scalable Deep Learning framework, MXNet. At the end of this tutorial you’ll gain hands on experience targeting a variety of applications including computer vision and recommendation engines as well as exposure to how to use preconfigured Deep Learning AMIs and CloudFormation Templates to help speed your development.
- POSTECH EECE695J, "딥러닝 기초 및 철강공정에의 활용", Week 5
- Contents: Restricted Boltzmann Machine (RBM), various activation functions, data preprocessing, regularization methods, training of a neural network
- Video: https://youtu.be/v4rGPl-8wdo
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
* Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
The first part of a series of talks about modern algorithms and data structures, used by nosql databases like HBase and Cassandra. An explanation of Bloom Filters and several derivates, and Merkle Trees.
Mastering the game of Go with deep neural networks and tree search (article o...Ilya Kuzovkin
"This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away." In this presentation I go through the pipeline following the steps the algorithm does to understand the process.
Deep Convolutional GANs - meaning of latent spaceHansol Kang
DCGAN은 GAN에 단순히 conv net을 적용했을 뿐만 아니라, latent space에서도 의미를 찾음.
DCGAN 논문 리뷰 및 PyTorch 기반의 구현.
VAE 세미나 이슈 사항에 대한 리뷰.
my github : https://github.com/messy-snail/GAN_PyTorch
[참고]
https://github.com/znxlwm/pytorch-MNIST-CelebA-GAN-DCGAN
https://github.com/taeoh-kim/Pytorch_DCGAN
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
Big Data is a new term used in Business Analytics to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data.
In this talk, we will focus on advanced techniques in Big Data mining in real time using evolving data stream techniques: using a small amount of time and memory resources, and being able to adapt to changes. We will discuss a social network application of data stream mining to compute user influence probabilities. And finally, we will present the MOA software framework with classification, regression, and frequent pattern methods, and the SAMOA distributed streaming software that runs on top of Storm, Samza and S4.
Simple representations for learning: factorizations and similarities Gael Varoquaux
Real-life data seldom comes in the ideal form for statistical learning.
This talk focuses on high-dimensional problems for signals and
discrete entities: when dealing with many, correlated, signals or
entities, it is useful to extract representations that capture these
correlations.
Matrix factorization models provide simple but powerful representations. They are used for recommender systems across discrete entities such as users and products, or to learn good dictionaries to represent images. However they entail large computing costs on very high-dimensional data, databases with many products or high-resolution images. I will present an
algorithm to factorize huge matrices based on stochastic subsampling that gives up to 10-fold speed-ups [1].
With discrete entities, the explosion of dimensionality may be due to variations in how a smaller number of categories are represented. Such a problem of "dirty categories" is typical of uncurated data sources. I will discuss how encoding this data based on similarities recovers a useful category structure with no preprocessing. I will show how it interpolates between one-hot encoding and techniques used in character-level natural language processing.
[1] Stochastic subsampling for factorizing huge matrices, A Mensch, J Mairal, B Thirion, G Varoquaux, IEEE Transactions on Signal Processing 66 (1), 113-128
[2] Similarity encoding for learning with dirty categorical variables. P Cerda, G Varoquaux, B Kégl Machine Learning (2018): 1-18
Word embeddings have received a lot of attention since some Tomas Mikolov published word2vec in 2013 and showed that the embeddings that the neural network learned by “reading” a large corpus of text preserved semantic relations between words. As a result, this type of embedding started being studied in more detail and applied to more serious NLP and IR tasks such as summarization, query expansion, etc… More recently, researchers and practitioners alike have come to appreciate the power of this type of approach and have started a cottage industry of modifying Mikolov’s original approach to many different areas.
In this talk we will cover the implementation and mathematical details underlying tools like word2vec and some of the applications word embeddings have found in various areas. Starting from an intuitive overview of the main concepts and algorithms underlying the neural network architecture used in word2vec we will proceed to discussing the implementation details of the word2vec reference implementation in tensorflow. Finally, we will provide a birds eye view of the emerging field of “2vec" (dna2vec, node2vec, etc...) methods that use variations of the word2vec neural network architecture.
This (short) version of the Tutorial was presented at #AIWTB https://ai.withthebest.com/. See https://bmtgoncalves.github.io/word2vec-and-friends/ for further details on future (and longer) editions and sign up to http://tinyletter.com/dataforscience for related news and updates.
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
The first part of a series of talks about modern algorithms and data structures, used by nosql databases like HBase and Cassandra. An explanation of Bloom Filters and several derivates, and Merkle Trees.
Mastering the game of Go with deep neural networks and tree search (article o...Ilya Kuzovkin
"This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away." In this presentation I go through the pipeline following the steps the algorithm does to understand the process.
Deep Convolutional GANs - meaning of latent spaceHansol Kang
DCGAN은 GAN에 단순히 conv net을 적용했을 뿐만 아니라, latent space에서도 의미를 찾음.
DCGAN 논문 리뷰 및 PyTorch 기반의 구현.
VAE 세미나 이슈 사항에 대한 리뷰.
my github : https://github.com/messy-snail/GAN_PyTorch
[참고]
https://github.com/znxlwm/pytorch-MNIST-CelebA-GAN-DCGAN
https://github.com/taeoh-kim/Pytorch_DCGAN
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
Big Data is a new term used in Business Analytics to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data.
In this talk, we will focus on advanced techniques in Big Data mining in real time using evolving data stream techniques: using a small amount of time and memory resources, and being able to adapt to changes. We will discuss a social network application of data stream mining to compute user influence probabilities. And finally, we will present the MOA software framework with classification, regression, and frequent pattern methods, and the SAMOA distributed streaming software that runs on top of Storm, Samza and S4.
Simple representations for learning: factorizations and similarities Gael Varoquaux
Real-life data seldom comes in the ideal form for statistical learning.
This talk focuses on high-dimensional problems for signals and
discrete entities: when dealing with many, correlated, signals or
entities, it is useful to extract representations that capture these
correlations.
Matrix factorization models provide simple but powerful representations. They are used for recommender systems across discrete entities such as users and products, or to learn good dictionaries to represent images. However they entail large computing costs on very high-dimensional data, databases with many products or high-resolution images. I will present an
algorithm to factorize huge matrices based on stochastic subsampling that gives up to 10-fold speed-ups [1].
With discrete entities, the explosion of dimensionality may be due to variations in how a smaller number of categories are represented. Such a problem of "dirty categories" is typical of uncurated data sources. I will discuss how encoding this data based on similarities recovers a useful category structure with no preprocessing. I will show how it interpolates between one-hot encoding and techniques used in character-level natural language processing.
[1] Stochastic subsampling for factorizing huge matrices, A Mensch, J Mairal, B Thirion, G Varoquaux, IEEE Transactions on Signal Processing 66 (1), 113-128
[2] Similarity encoding for learning with dirty categorical variables. P Cerda, G Varoquaux, B Kégl Machine Learning (2018): 1-18
Word embeddings have received a lot of attention since some Tomas Mikolov published word2vec in 2013 and showed that the embeddings that the neural network learned by “reading” a large corpus of text preserved semantic relations between words. As a result, this type of embedding started being studied in more detail and applied to more serious NLP and IR tasks such as summarization, query expansion, etc… More recently, researchers and practitioners alike have come to appreciate the power of this type of approach and have started a cottage industry of modifying Mikolov’s original approach to many different areas.
In this talk we will cover the implementation and mathematical details underlying tools like word2vec and some of the applications word embeddings have found in various areas. Starting from an intuitive overview of the main concepts and algorithms underlying the neural network architecture used in word2vec we will proceed to discussing the implementation details of the word2vec reference implementation in tensorflow. Finally, we will provide a birds eye view of the emerging field of “2vec" (dna2vec, node2vec, etc...) methods that use variations of the word2vec neural network architecture.
This (short) version of the Tutorial was presented at #AIWTB https://ai.withthebest.com/. See https://bmtgoncalves.github.io/word2vec-and-friends/ for further details on future (and longer) editions and sign up to http://tinyletter.com/dataforscience for related news and updates.
We start by very briefly introducing the Twitter platform and detailing the demographics of the users and the biases they introduce. The relationship between geography, mobility and social network properties will be described using the Twitter service as a case study. Finally, tutorial attendees will get the chance to review the most seminal works in the area where spatial and geographic perspectives are highlighted.
The increasing availability of huge amounts of data on every aspect of societal and individual behavior has created unprecedented opportunities but it also created new challenges. On one hand, it is now much easier to acquire detailed datasets no practically anything while on the other it is much harder to make sure that our observations and conclusions are well justified and robust. In this short 3h tutorial we will briefly introduce some of the fundamental principles and algorithms of Machine Learning in an intuitive and practical way. Mathematical requirements will be kept to a minimum and short snippets of Python code will be presented to illustrate the application of each algorithm.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This is the "MNIST 10-Class Classifiers" project presentation that was part of "Introduction to Machine Learning" course at UCSC Silicon Valley Extension, CA. This Classifier was implemented in Jupyter Notebook, using Python, SciPy, Pandas, NumPy, and Matplotlib.
The goals were:
* Find the number of dimensions to use after performing dimension reduction with PCA.
Train different supervised and unsupervised classifiers with MNIST training data set.
* Validate the trained classifier with the MNIST testing data set.
* Compare and contrast the behaviour of supervised and unsupervised algorithms with respect to MNIST data.
* Measure classifier performance and compute metrics.
* Learn from each other to analyze and implement ML algorithms.
* Present the general training and testing approach/design and results.
* Focus on improving accuracy of classifiers, as HW is cheap compared to when the Classifier algorithms were designed.
* Across team - share ideas and review design, implementation and results.
Our team was called Cortes Machine Learning Group.
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
This presentation on Machine Learning will help you understand why Machine Learning came into picture, what is Machine Learning, types of Machine Learning, Machine Learning algorithms with a detailed explanation on linear regression, decision tree & support vector machine and at the end you will also see a use case implementation where we classify whether a recipe is of a cupcake or muffin using SVM algorithm. Machine learning is a core sub-area of artificial intelligence; it enables computers to get into a mode of self-learning without being explicitly programmed. When exposed to new data, these computer programs are enabled to learn, grow, change, and develop by themselves. So, to put simply, the iterative aspect of machine learning is the ability to adapt to new data independently. Now, let us get started with this Machine Learning presentation and understand what it is and why it matters.
Below topics are explained in this Machine Learning presentation:
1. Why Machine Learning?
2. What is Machine Learning?
3. Types of Machine Learning
4. Machine Learning Algorithms
- Linear Regression
- Decision Trees
- Support Vector Machine
5. Use case: Classify whether a recipe is of a cupcake or a muffin using SVM
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars. This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
We recommend this Machine Learning training course for the following professionals in particular:
1. Developers aspiring to be a data scientist or Machine Learning engineer
2. Information architects who want to gain expertise in Machine Learning algorithms
3. Analytics professionals who want to work in Machine Learning or artificial intelligence
4. Graduates looking to build a career in data science and Machine Learning
Learn more at: https://www.simplilearn.com/
This slide is my presentation for a reading circle "Machine Learning Professional Series".
Japanese version is here.
http://www.slideshare.net/matsukenbook/ss-50545587
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
4. www.bgoncalves.com@bgoncalves
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf
E X P E R T O P I N I O N
Contact Editor: Brian Brannon, bbrannon@computer.org
behavior. So, this corpus could serve as the basis of
a complete model for certain tasks—if only we knew
how to extract the model from the data.
E
ugene Wigner’s article “The Unreasonable Ef-
fectiveness of Mathematics in the Natural Sci-
ences”1 examines why so much of physics can be
Alon Halevy, Peter Norvig, and Fernando Pereira, Google
The Unreasonable
Effectiveness of Data
Big Data
7. www.bgoncalves.com@bgoncalves
From Data To Information
Statistics are numbers that summarize raw facts and figures in some
meaningful way. They present key ideas that may not be immediately
apparent by just looking at the raw data, and by data, we mean facts or figures
from which we can draw conclusions. As an example, you don’t have to
wade through lots of football scores when all you want to know is the league
position of your favorite team. You need a statistic to quickly give you the
information you need.
The study of statistics covers where statistics come from, how to calculate them,
and how you can use them effectively.
Gather data
Analyze
Draw conclusions
When you’ve analyzed
your data, you make
decisions and predictions.
Once you have data, you can analyze itand generate statistics. You can calculateprobabilities to see how likely certain eventsare, test ideas, and indicate how confidentyou are about your results.
At the root of statistics is data.
Data can be gathered by looking
through existing sources, conducting
experiments, or by conducting surveys.
11. www.bgoncalves.com@bgoncalves
Central Limit Theorem
• As the random variables:
• with:
• converge to a normal distribution:
• after some manipulations, we find:
n ! 1
Sn =
1
n
X
i
xi
N 0, 2
p
n (Sn µ)
Sn ⇠ µ +
N 0, 2
p
n
The estimation of the mean converges to the true
mean with the square root of the number of samples
! SE = p
n
13. www.bgoncalves.com@bgoncalves
Experimental Measurements
• Experimental errors commonly assumed gaussian distributed
• Many experimental measurements are actually averages:
• Instruments have a finite response time and the quantity of
interest varies quickly over time
• Stochastic Environmental factors
• Etc
14. www.bgoncalves.com@bgoncalves
• In an experimental measurement, we expect (CLT) the experimental values to be
normally distributed around the theoretical value with a certain variance. Mathematically,
this means:
• where are the experimental values and the theoretical ones. The likelihood is
then:
• Where we see that to maximize the likelihood we must minimize the sum of squares
MLE - Fitting a theoretical function to experimental data
y f (x) ⇡
1
p
2 2
exp
"
(y f (x))
2
2 2
#
y f (x)
Least Squares Fitting
L =
N
2
log
⇥
2 2
⇤ X
i
"
(yi f (xi))
2
2 2
#
15. www.bgoncalves.com@bgoncalves
MLE - Linear Regression
• Let’s say we want to fit a straight line to a set of points:
• The Likelihood function then becomes:
• With partial derivatives:
• Setting to zero and solving for and :
y = w · x + b
L =
N
2
log
⇥
2 2
⇤ X
i
"
(yi w · xi b)
2
2 2
#
@L
@w
=
X
i
[2xi (yi w · xi b)]
@L
@b
=
X
i
[(yi w · xi b)]
ˆw =
P
i (xi hxi) (yi hyi)
P
i (xi hxi)
2
ˆb = hyi ˆwhxi
ˆw ˆb
16. @bgoncalves
MLE for Linear Regression
from __future__ import print_function
import sys
import numpy as np
from scipy import optimize
data = np.loadtxt(sys.argv[1])
x = data.T[0]
y = data.T[1]
meanx = np.mean(x)
meany = np.mean(y)
w = np.sum((x-meanx)*(y-meany))/np.sum((x-meanx)**2)
b = meany-w*meanx
print(w, b)
#We can also optimize the Likelihood expression directly
def likelihood(w):
global x, y
sigma = 1.0
w, b = w
return np.sum((y-w*x-b)**2)/(2*sigma)
w, b = optimize.fmin_bfgs(likelihood, [1.0, 1.0])
print(w, b)
MLElinear.py
20. www.bgoncalves.com@bgoncalves
3 Types of Machine Learning
• Unsupervised Learning
• Autonomously learn an good representation of the dataset
• Find clusters in input
• Supervised Learning
• Predict output given input
• Training set of known inputs and outputs is provided
• Reinforcement Learning
• Learn sequence of actions to maximize payoff
• Discount factor for delayed rewards
23. www.bgoncalves.com@bgoncalves
K-Means
• Choose k randomly chosen points to be the centroid of each
cluster
• Assign each point to belong the cluster whose centroid is closest
• Recompute the centroid positions (mean cluster position)
• Repeat until convergence
25. www.bgoncalves.com@bgoncalves
K-Means: Convergence
• How to quantify the “quality” of the solution found at each iteration, ?
• Measure the “Inertia”, the square intra-cluster distance:
where are the coordinates of the centroid of the cluster to which is assigned.
• Smaller values are better
• Can stop when the relative variation is smaller than some value
µi xi
In+1 In
In
< tol
In =
NX
i=0
kxi µik
2
n
29. www.bgoncalves.com@bgoncalves
Silhouettes
• For each point define as:
the average distance between point and every other point within cluster .
• Let be:
the minimum value of excluding
• The silhouette of is then:
ac (xi)
ac (xi) =
1
Nc
X
j2c
kxi xjk
b (xi) = min
c6=ci
ac (xi)
s (xi) =
b (xi) aci (xi)
max {b (xi) , aci
(xi)}
xi
xi c
b (xi)
ciac (xi)
xi
36. www.bgoncalves.com@bgoncalves
Principle Component Analysis
• The Principle Component projection, T, of a matrix A is defined as:
• where W is the eigenvector matrix of:
• and corresponds to the right singular vectors of A obtained by Singular Value
Decomposition (SVD):
• So we can write:
• Showing that the Principle Component projection corresponds to the left
singular vectors of A scaled by the respective singular values
• Columns of T are ordered in order of decreasing variance.
T = AW
AT
A
T = U⌃WT
W ⌘ U⌃
A = U⌃WT
Generalization of
Eigenvalue/
Eigenvector
decomposition
for non-square
matrices.
⌃
37. @bgoncalves
Principle Component Analysis
from __future__ import print_function
import sys
from sklearn.decomposition import PCA
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt(sys.argv[1])
x = data.T[0]
y = data.T[1]
pca = PCA()
pca.fit(data)
meanX = np.mean(x)
meanY = np.mean(y)
plt.style.use('ggplot')
plt.plot(x, y, 'r*')
plt.plot([meanX, meanX+pca.components_[0][0]*pca.explained_variance_[0]],
[meanY, meanY+pca.components_[0][1]*pca.explained_variance_[0]], 'b-')
plt.plot([meanX, meanX+pca.components_[1][0]*pca.explained_variance_[1]],
[meanY, meanY+pca.components_[1][1]*pca.explained_variance_[1]], 'g-')
plt.title('PCA Visualization')
plt.legend(['data', 'PCA1', 'PCA2'], loc=2)
plt.savefig('PCA.png')
PCA.py
40. www.bgoncalves.com@bgoncalves
Supervised Learning - Classification
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
.
Sample N
Feature1
Feature3
Feature2
…
Label
FeatureM
• Dataset formatted as an NxM matrix of N samples and M
features
• Each sample belongs to a specific class or has a specific label.
• The goal of classification is to predict to which class a
previously unseen sample belongs to by learning defining
regularities of each class
• K-Nearest Neighbor
• Support Vector Machine
• Neural Networks
• Two fundamental types of problems:
• Classification
• Regression (see Linear Regression above)
41. www.bgoncalves.com@bgoncalves
Supervised Learning - Overfitting
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
.
Sample N
Feature1
Feature3
Feature2
…
Label
FeatureM
• “Learning the noise”
• “Memorization” instead of “generalization”
• How can we prevent it?
• Split dataset into two subsets: Training and Testing
• Train model using only the Training dataset and evaluate
results in the previously unseen Testing dataset.
• Different heuristics on how to split:
• Single split
• k-fold cross validation: split dataset in k parts, train in k-1
and evaluate in 1, repeat k times and average results.
TrainingTesting
42. www.bgoncalves.com@bgoncalves
Supervised Learning - Overfitting
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
.
Sample N
Feature1
Feature3
Feature2
…
Label
FeatureM
• “Learning the noise”
• “Memorization” instead of “generalization”
• How can we prevent it?
• Split dataset into two subsets: Training and Testing
• Train model using only the Training dataset and evaluate
results in the previously unseen Testing dataset.
• Different heuristics on how to split:
• Single split
• k-fold cross validation: split dataset in k parts, train in k-1
and evaluate in 1, repeat k times and average results.
TrainingTesting
45. www.bgoncalves.com@bgoncalves
K-nearest neighbors
• Perhaps the simplest of supervised learning algorithms
• Effectively memorizes all previously seen data
• Intuitively takes advantage of natural data clustering
• Define that the class of any datapoint is given by the plurality of it’s k
nearest neighbors
• It’s not obvious how to find the right value of k
49. www.bgoncalves.com@bgoncalves
How the Brain “Works” (Cartoon version)
• Each neuron receives input from other neurons
• 1011 neurons, each with with 104 weights
• Weights can be positive or negative
• Weights adapt during the learning process
• “neurons that fire together wire together” (Hebb)
• Different areas perform different functions using same structure (Modularity)
62. www.bgoncalves.com@bgoncalves
Forward Propagation
• The output of a perceptron is determined by a sequence of steps:
• obtain the inputs
• multiply the inputs by the respective weights
• calculate output using the activation function
• To create a multi-layer perceptron, you can simply use the output of
one layer as the input to the next one.
• But how can we propagate back the errors and update the weights?
x1
x2
x3
xN
w
1j
w2j
w3j
wN
j
aj
w
0j
1
wT
x
1
w
0k
w
1k
w2k
w3k
wNk
ak
wT
a
a1
a2
aN
63. www.bgoncalves.com@bgoncalves
Backward Propagation of Errors (BackProp)
• BackProp operates in two phases:
• Forward propagate the inputs and calculate the deltas
• Update the weights
• The error at the output is the squared difference between predicted
output and the observed one:
• Where is the real output and is the predicted one.
• For inner layers there is no “real output”!
t
E = (t y)
2
y
64. www.bgoncalves.com@bgoncalves
Chain-rule
• From the forward propagation described above, we know that the
output of a neuron is:
• But how can we calculate how to modify the weights ?
• We take the derivative of the error with respect to the weights!
• Using the chain rule:
• And finally we can update each weight in the previous layer:
• where is the learning rate
yj = wT
x
@E
@wij
=
@E
@yj
@yj
@wij
wij
@E
@wij
wij wij ↵
@E
@wij
yj
↵
66. www.bgoncalves.com@bgoncalves
Backprop
• Back propagation solved the fundamental problem underlying neural networks
• Unfortunately, computers were still too slow for large networks to be trained successfully
• Also, in many cases, there wasn’t enough available data
71. www.bgoncalves.com@bgoncalves
Support Vector Machines
• Decision plane has the form:
• We want for points in the “positive” class and for points in the negative
class. Where the “margin” is as large as possible.
• Normalize such that and solve the optimization problem:
subject to:
• The margin is:
wT
x = 0
wT
x b wT
x b
2b
min
w
||w||2
yi wT
x > 1
2
||w||
b = 1
72. www.bgoncalves.com@bgoncalves
Kernel “trick”
• SVM procedure uses only the dot products of vectors and never the vectors themselves.
• We can redefine the dot product in any way we wish.
• In effect we are mapping from a non-linear input space to a linear feature space