01. introduction

Machine Learning? 기계 학습?

 학습(learning)
○ 하나의 문제를 수행한 후에 그 추론과정에서 얻은 경험을 바탕으로 시스템의 지식을
수정 및 보완하여, 다음에 그 문제나 또는 비슷한 문제를 수행할 때에는 처음보다 더
효율적이고 효과적으로 문제를 해결할 수 있는 적응성
○ 새로운 선언적 지식의 습득, 지도 및 실습을 통한 인지적인 기술의 개발, 새로운 지식의
일반적이고 효과적인 표현으로의 조직화, 관찰 이나 실험을 통한 새로운 사실이나 이론
의 발견 등과 같은 다양한 과정들을 포함한다.
※ Wikipedia 참조

 기계 학습(Machine Learning)
○ 기계 학습(machine learning)은 인공 지능의 한 분야로, 컴퓨터가 학습할 수 있도록 하
는 알고리즘과 기술을 개발하는 분야를 말한다. 가령, 기계 학습을 통해서 수신한 이메
일이 스팸인지 아닌지를 구분할 수 있도록 훈련할 수 있다.
○ "컴퓨터에게 배울 수 있는 능력, 즉 코드로 정의하지 않은 동작을 실행하는 능력에 대
한 연구 분야"
○ 훈련 데이터를 통해 이미 알려진 속성을 기반으로 예측하는 능력을 의미한다.

 데이터 마이닝(data mining)
○ 대규모로 저장된 데이터 안에서 체계적이고 자동적으로 통계적 규칙이나 패턴을 찾아
내는 것이다.
○ 다른 말로는 KDD(데이터베이스 속의 지식 발견, knowledge-discovery in databases)라
고도 일컫는다.

1. Introduction of Machine Learning
2. Naive Bayesian Classifier
3. Linear Regression
4. Logistic Regression
5. K-means Clsutering
6. Graph Mining
7. Dimensional Reduction (PCA)
8. Spectral Clustering
9. Association Rule Mining
10. Bayesian Network 1 & 2
11. Decision tree
12. Support Vector Machine (SVM) 1 & 2
13. Hidden Markov Model (HMM)
14. Markov chain Monte Carlo(MCMC)
15. Gibbs Sampling
16. Latent Dirichlet allocation (LDA)
17. Neural Networks

Definition( Machine Learning / 기계학습 )
A set of methods that can automatically detect patters in data, and then use
the uncovered patterns to predict patterns to predict future data, or to
perform other kinds of decision making under uncertainty
"Machine Learning-A Probabilistic Perspective"
Kevin P. Murphy

Supervised Learning
Classification
Bayesian Classifier
Logistic Regression
KNN Classifier
Support Vector Machine
(SVM)
Regression Linear Regression
Unsupervised Learning
Clustering
K-means Clustering
Spectral Clustering
Dimensional Reduction PCA
Reinforce Learning

A feature vector is an 𝑛-dimensional vector of numerical features
that represent some object.
For example ,
a document , 𝑥𝑖 : 문서 안에서의 𝑖 번째 단어
I love you. 𝕩 = 𝐼, 𝑙𝑜𝑣𝑒, 𝑦𝑜𝑢

𝕩 𝑦
𝑓mapping
불연속 값 또는
연속 값
E-mail (Words) Spam or Not(불)
Web Site (Words) Sports or Science or News(불)
특성 벡터
꽃 (꽃의 생김새) Line-flower or Mass-flower(불)
아이의 키 아버지의의 키(연)
방의 개수, 방의 넓이 집 값(연)
(visit, money, buy, girl, Viagra)
For example,
Spam mail
𝑓

Goal of Supervised Learning (predictive learning)
To learn a mapping(function) from input 𝕩 to output 𝑦, given a labeled set of
input-output pairs 𝐷 = 𝕩𝑖, 𝑦𝑖 𝑖=1
𝑁
, where 𝐷 is called the training set, and 𝑁
is the number of training examples.
- 𝕩𝑖 : 𝐷-dimensional vector of numbers ⇒ feature vector
- 𝑦𝑖 : response variable ⇒ categories or real-values
𝑦𝑖 is categorical, the problem is classification,
𝑦𝑖 is real-valued, the problem is regression.

What is natural grouping among these objects?
Simpson's Family School Employees Females males

Goal of Unsupervised Learning (descriptive learning)
To find "interesting pattern" in the data, given 𝐷 = 𝕩𝑖 𝑖=1
𝑁
, where 𝐷 is called
the training set, and 𝑁 is the number of training examples.
- 𝕩𝑖 : 𝐷-dimensional vector of numbers ⇒ feature vector
It is known knowledge discovery.

Goal of Reinforcement Learning
To learn how to act or behave when given occasional reward or punishment
sinnals.

𝕩 𝑦
𝑓mapping
불연속 값 또는
연속 값
E-mail (Words) Spam or Not(불)
Web Site (Words) Sports or Science or News(불)
특성 벡터
꽃 (꽃의 생김새) Line-flower or Mass-flower(불)

Goal of Classification
To learn a mapping(function) from input 𝕩 to output 𝑦, where 𝑦 ∈ {1, … , 𝐶},
with 𝐶 being the number of classes.
function approximation
Assume that 𝑦 = 𝑓(𝕩) for unknown function ℎ, the goal of learning is to
estimate function 𝑓 given a labeled training set, and then to make predictions
using 𝑦∗ = 𝑓∗(𝕩). Then we can make predictions on novel input.
way to formalize the problem

Compute!! our "best guess" using
𝑦∗ = 𝑓∗ 𝕩 = 𝑎𝑟𝑔 max
𝑐=1…𝐶
𝑝(𝑦 = 𝑐|𝕩, 𝐷)
This corresponds to the most probable class label.
It is known as a MAP estimate (Maximum a Posterior).
Ex) mail spam filtering
𝑝 𝑦 = 1 𝕩, 𝐷)
𝑝 𝑦 = 2 𝕩, 𝐷)
...
𝑝 𝑦 = 𝐶 𝕩, 𝐷)

Goal of Regression
To learn a mapping(function) from input 𝕩 to output 𝑦, where 𝑦 is continuous.
𝑦 = 𝜖1 + 𝜖2 𝑥 𝑦 = 𝜖1 + 𝜖2 𝑥 + 𝜀3 𝑥2

Goal of Clustering
To estimate the distribution over the number of cluster, 𝑝 𝐾 𝐷 ; this tells us if
there are subpopulations within the data
To estimate which cluster each point belong to.

𝐾∗
= 𝑎𝑟𝑔 max
𝐾
𝑝(𝐾|𝐷)
We often approximate the distribution
We can infer which cluster each data point belongs to by computing
𝑧𝑖
∗
= 𝑎𝑟𝑔 max
𝑘
𝑝(𝑧𝑖 = 𝑘|𝕩𝑖, 𝐷)
𝑝 𝑧𝑖 = 1 𝕩, 𝐷)
𝑝 𝑧𝑖 = 2 𝕩, 𝐷)
...
𝑝 𝑧𝑖 = 𝐶 𝕩, 𝐷)

Goal of Dimensional Reduction
To reduce the dimensionality by projecting the data to a lower dimensional
subspace which captures the "essence" of the data.
Motivation : Although the data may appear high dimensional, there are
only be a small number of degrees of variability, corresponding
to latent factors.
latent factor : which describe most of the variability

 "Machine Learning-A Probabilistic Perspective" Kevin P. Murphy
 http://ko.wikipedia.org/wiki/기계_학습

01. introduction

More Related Content

What's hot

Similar to 01. introduction

01. introduction