• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Machine learning Introduction

Machine learning Introduction






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Unsupervised learning(聚类,降维(topic model)): learn structure from unlabeled data. Closely related with density estimation; summarize the dataSemi-supervised learning: use both labeled and unlabeled samples for training; It’s cost to collect lots of labels sometimes, use both
  • 除此之外,你对模型的熟悉程度。
  • Expected risk: 定义好loss function,选择一个预估函数,有一个输入变量和response value的联合分布,在该联合分布上对损失函数求积分,即为期望风险;通过最小化该期望风险,我们找到一个最优的预估函数。但是实际上,我们并不知道该联合分布,我们有的是从该联合分布中有偏或无偏采样得到的有限样本,可能还有一些noise点。我们转为最小化在该有限样本上的最小化loss function寻找预估函数。 即我们转为最小化经验风险另一方面,我们往往给目标函数指定function family,该function family极有可能没有包含最优或者较优的那些目标函数。误差的大小: 第一部分:函数family F中的预估函数有多接近真正最优的预估函数;第二部分:我们选择优化经验而不是经验风险
  • Logistic regression is one of the most popular classifier.Advantage: 1. easy understand and implement; 2. not bad performance; 3. light weight and less time taken for training and prediction;(can handle large dataset) 4. easy parallelizationValue to attendances:Know about what is logistic regression, what’s the advantages and disadvantage. what kind of problems are suitable apply to.L1 and L2 regularizationHow to inference through maximize likelihood with gradient descent. And know how to implement it
  • 对于generalized linear model,如果response variable是binomial或者multinomial分布,且选择了logit function作为link function 就是logistic regressionLogistic function 是logit function的反函数
  • Binary(binomial) logistic regression
  • 负梯度方向是使得函数值下降最快的方向
  • 在重新计算likelihood前,我们看一下这2种正则化背后的理论基础
  • 假设全部的weight服从一致的分布
  • Laplace 分布一阶倒数不连续假设全部的weight服从一致的分布(均值为0,Laplace参数也一样)W_k在一次更新中不能变换正负号
  • L1拟合得到的weight通常较稀疏,带来2点好处: 帮我们做特征选择,工程上更有利
  • 增加了decay ratio:AUC稍有提高(0.845 -> 0.847)在不同step时,适合的decay ratio也不一样Iteration times: 与样本量的大小有关
  • 例子:今天是高考第一天,高考选专业,每个人有多个候选,但是仅能选择一门专业(计算机,金融,化学,数学,物理,生物)和binomial分布应用的区别多类问题,可以转化为多个两类问题,如果我们的问题是“找出每门课成绩前10%的学生”,我们可以用两类logistic regression来做如果问题是“对于每个学生找出其成绩最好的课,或者最好的几门课”,两类问题就不是很适合 (每一类上的预估概率之和不等于1,无法比较不同类上的概率)Multi-nominal适用于response value为category的情况,不太适合ordinal的情况。我实现了。
  • Link function: (1) generalized linear model的重要组成部分:将linear regression拓展到generalized linear model;(2)link function的反函数的自变量介于(-无穷,+无穷),若y服从binominal分布,应变量介于【0,1】区间The inverse of any continuous cumulative distribution function (CDF) can be used for the link since the CDF’s range is [0,1]
  • Generalized linear model 广义上的线性模型,都有一个基本的线性单元W*X(linear regression),通过各种link function建立该线性单元和各种分布的response variable的关系。包含linear regression (normal distribution),logistic regression (binominal/multi-nominal distribution), Poisson regression (Poisson distribution)对于binominal/multi-nominal distribution,我们也可以选择除logit link function之外的link function (广义的logistic regression)

Machine learning Introduction Machine learning Introduction Presentation Transcript

  • Machine  Learning  Introduc1on   guodong@hulu.com     Machine  learning  introduc0on   Logis1c  regression   Feature  selec1on   Boos1ng,  tree  boos1ng     See  more  ML  posts:  h>p://dongguo.me/    
  • Machine  Learning  Makes  Life  Be>er  
  • Learning   •  What  is  learning   –  Find  rules  from  data/experience   •  Why  learning  is  possible   –  Assume  rules  exist  in  this  world   •  How  to  learn   –  Induc1ve  
  • What  is  machine  learning   •  “Machine  Learning  is  a  field  of  study  that  gives   computers  the  ability  to  learn  without  being   explicitly  programmed”  -­‐  Arthur  Samuel  (1959)   •  Machine  learning  is  the  study  of  computer   algorithms  that  improve  automa1cally  through   experience”  –  Tom  Mitchell  (1998)  
  • Overview  of  machine  learning       Machine  Learning   Unsupervised   Learning   Supervised   Learning   Classifica1on   Semi-­‐supervised   Learning   Regression  
  • Outline   •  Supervised  Learning   •  Case  Study   •  Challenge   •  Resource  
  • Supervised  learning   •  Concepts   •  Defini1on   •  Models   •  Metrics   •  Open  Ques1ons  
  • Concepts   Problem       Generate  dataset   Dataset   Train   Sample/instance   Feature  vector   label   model   Predict   Test   Model  Tuning   Feature  selec0on  
  • What  is  Supervised  learning   •  Find  a  func1on  (from  some  func1on  space)  to   predict  for  unseen  instances,  from  the  labeled   training  data   –  Func1on  space:  determined  by  the  chosen  model   –  Find  the  func1on:  minimize  error  on  training  data  with   some  cost  func1on   •  2  types:  Classifica1on  and  regression  
  • Formal  defini1on   •  Given  a  training  dataset   r N {xi , yi }i =1 •  And  define  a  loss  func1on   ∧ ∧ L( y, y ), where y = f ( x) •  Target   ∧ f ( x) =arg min G ( f ), f 1 st. G ( f ) = N N ∑ L( y , f ( x )) i =1 i i
  • Models  for  supervised  learning   •  Classifica1on  and  regression   –  For  classifica1on:  LR(Logis1c  regression),  Naïve  Bayes   –  For  regression:  linear  regression   –  For  Both:  Trees,  KNN,  SVM,  ANN   •  Genera1ve  and  Discrimina1ve   –  Genera1ve:  Naïve  Bayes,  GMM,  HMM   –  Discrimina1ve:  KNN,  LR,  SVM,  ANN,  Trees   •  Parametric  and  nonparametric   –  Parametric:  LR,  Naïve  Bayes,  ANN   –  nonparametric:  Trees,  KNN,  kernel  methods  
  • Decision  Tree   •  Would  you  like  to  date  somebody?   Gender   male   female   Good   looking?   Yes!   Pass   No!   umm..   Pass   Others…   Accept   Very  good   Accept   else   Pass  
  • K-­‐Nearest  Neighbor  classifier   K=15   K=1  
  • Naïve  Bayes   •  Bayes  classifier   •  Condi1onal  Independence  assump1on   •  With  this  assump1on    
  • Logis1c  regression   •  Logis1c  func1on      
  • Ar1ficial  neural  network  
  • Support  vector  machine  
  • Model  Inference   •  Typical  inference  methods   –  Gradient  descent   –  Expecta1on  Maximiza1on   –  Sampling  based  
  • Model  ensemble   •  Averaging  or  vo1ng  output  of  mul1ply  classifiers   •  Bagging  (bootstrap  aggrega1ng)   –  Train  mul1ple  base  models   –  Vote  mul1ply  base  classifiers  with  same  weight   –  Improve  model  stability  and  avoid  overfihng   –  Work  well  on  unstable  base  classifier   •  Adaboost  (adap1ve  boos1ng)   –  Sequen1al  base  classifiers   –  Misclassified  instances  have  higher  weight  in  next  base   classifier   –  Weighted  vo1ng  
  • Evalua1on  metrics   •  Common  Metrics  for  classifica1on   –  Accuracy   –  Precision-­‐Recall   –  AUC   •  For  regression   –  Mean  absolute  error  (MAE)   –  Mean  square  error  (MSE),  RMSE  
  • Ques1on1:  How  to  choose  a  suitable  model?   Characteris0c   Naïve   Bayes   Trees   K  Nearest   neighbor   Logis0c   regression   Neural   SVM   Networks   Natural  handling   data  of  “mixed”   type   Robustness  to   outliers  in  input   space   Computa1onal   scalability   Interpretability   1   3   1   1   1   1    3   3   3   3     1   1   3   3   1   3   1   1    2   2     1    2   1   1   Predic1ve  power   1   1    3   2   3   3   <Elements  of  Sta-s-cal  Learning>  II  P351      
  • Ques1on2:  Can  we  find  a  100%  accurate  model?       •  Expected  risk   •  Empirical  risk   •  Choose  a  family          for  candidate  predic1on  func1ons     •  Error  
  • Case  study:  Predic1ve  Demographic       Feature  extrac1on  (‘show’,  ‘ad  vote’,  ‘ad   selec1on’)   feature  analysis  (remove  ‘ad  selec1on’)   Load  login  profile   ML  problem?  What  kind?    Labels?   Evalua1on  metric?   Possible  features?  (show,  ad  vote,   ad  selec1on,  search…)    Accessible?       Problem   Dataset  genera1on   Choose  a  Model   1.  Familiar?  (NB,  ANN,  LR,  Tree,  SVM)   2.  Computa1onal  cost?  Interpretability?   Precision?     3.  Data:  amount?  noise  ra1o?     Train   Try  more  features(add   ‘OS’,  ‘browser’,  ‘flash’)   Feature  selec1on  (remove   ‘flash’,  and  non   anonymous  features)   Predictor     Try  more  models   Tuning   Evalua1on  (AUC,   Precision-­‐recall)   Test   Challenges   (Noise,  different  Join  distribu1on,  evalua1on)       model  ensemble   Predictor  on  product   Scoring   Online  Update  
  • Challenges  in  Machine  learning   •  Data   –  Sparse  data  in  high  dimensions   –  Limited  labels     •  Computa1on  Cost   –  Speed  Up  advanced  models   –  Paralleliza1on   •  Applica1on   –  Structured  predic1on  
  • Resource   •  •  •  •  Conference   Books   Lectures   Dataset  
  • Top  conference   •  •  •  •  •  ICML   NIPS   IJCAI/AAAI   KDD   Other  related   –  WSDM,  WWW,  SIGIR,  CIKM,  ICDE,  ICDM  
  • Books   •  •  •  •  Machine  Learning  [link]      by  Mitchell   Pa-ern  Recogni0on  and  Machine  Learning  [link]  by  Bishop   The  Elements  of  Sta0s0cal  Learning  [link]   Scaling  Up  Machine  Learning  [link]  
  • Lectures   •  Machine  Learning  open  class  –  by  Andrew  Ng   –  Video  in  YouTube   •  Advanced  topics  in  Machine  Learning  –  Cornell   •  h>p://videolectures.net/  
  • Other  research  resource   •  Research  Organs   –  Yahoo  Research  [link]   –  Google  Research  publica1ons  [link]   •  Dataset   –  UCI  machine  learning  Repository  [link]   –  kaggle.com