Successfully reported this slideshow.
Your SlideShare is downloading. ×

Ml intro

Loading in …3

Check these out next

1 of 56 Ad

More Related Content

Slideshows for you (20)

Similar to Ml intro (20)


Recently uploaded (20)


Ml intro

  1. 1. Introduc)on  to  Machine   Learning   Integrated  Knowledge  Solu)ons   h7ps://      
  2. 2. Agenda   •  What  is  machine  learning?   •  Why  machine  learning  and  why  now?   •  Machine  learning  terminology   •  Overview  of  machine  learning  methods   •  Machine  learning  to  deep  learning   •  Summary  and  Q  &  A  
  3. 3. What  is  machine  learning?  
  4. 4. What  is  Machine  Learning?   •  Machine  learning  deals  with  making  computers  learn   to  make  predic)ons/decisions  without  explicitly   programming  them.  Rather  a  large  number  of   examples  of  the  underlying  task  are  shown  to   op)mize  a  performance  criterion  to  achieve  learning.  
  5. 5. An  Example  of  Machine  Learning:  Credit   Default  Predic)on   We  have  historical  data  about  businesses  and  their  delinquency.  The  data  consists  of   100  businesses.  Each  business  is  characterized  via  two  a7ributes:  business  age  in   months  and  number  of  days  delinquent  in  payment.  We  also  know  whether  a  business   defaulted  or  not.  Using  machine  learning,  we  can  build  a  model  to  predict  the   probability  whether  a  given  business  will  default  or  not.     0   20   40   60   80   100   0   100   200   300   400   500  
  6. 6. Logis)c  Regression   •  The  model  that  is  used  here  is  called  the  logis&c  regression   model.    Lets  look  at  the  following  expression                ,  where  x1,  x2,…,  xk  are  the  a7ributes.     •  In  our  example,  the  a7ributes  are  business  age  and  number   of  days  of  delinquency.   •  The  quan)ty  p  will  always  lie  in  the  range  0-­‐1  and  thus  can  be   interpreted  as  the  probability  of  outcome  being  default  or  no   default.     p = e(a0+a1x1...+ak xk ) 1+e(a0+a1x1...+ak xk )  
  7. 7. Logis)c  Regression   •  By  simple  rewri)ng,  we  get:    log(p/(1-­‐p))  =  a0  +  a1x1  +  a2x2  +·∙·∙·∙  +  akxk     •  This  ra)o  is  called  log  odds   •  The  parameters  of  the  logis)c  model,  a0  ,  a1,…,  ak,     are  learned  via  an  op)miza)on  procedure   •  The  learned  parameters  can  then  be  deployed  in  the   field  to  make  predic)ons  
  8. 8. 0   0.2   0.4   0.6   0.8   1   1.2   1   5   9   13   17   21   25   29   33   37   41   45   49   53   57   61   65   69   73   77   81   85   89   93   97   Only  in  rare  cases,  we  get  a   100%  accurate  model.   Model  Details  and  Performance   Plot  of  predicted  default   probability  
  9. 9. Using  the  Model   •  What  is  the  probability  of  a  business   defaul)ng  given  that  business  has  been  with   the  bank  for  26  months  and  is  delinquent  for   58  days?               e0.008*26+0.102*58-­‐5.706/ (1+e0.008*26+0.102*58-­‐5.706)   0.603   Plug  the  model   parameters  to   calculate  p   BUSAGE:  0.008;  DAYSDELQ:  0.102;  Intercept:  -­‐5.076  
  10. 10. Why  Machine  Learning  and  Why  Now?  
  11. 11. Why  Machine  Learning?  
  12. 12. Buzz  about  Machine  Learning   "Every  company  is  now  a  data  company,   capable  of  using  machine  learning  in  the  cloud   to  deploy  intelligent  apps  at  scale,  thanks  to   three  machine  learning  trends:  data  flywheels,   the  algorithm  economy,  and  cloud-­‐hosted   intelligence."   Three  factors  are  making  machine  learning  hot.  These  are  cheap  data,   algorithmic  economy,  and  cloud-­‐based  solu)ons.  
  13. 13. Data  is  gemng  cheaper   For  example,  Tesla  has  780  million  miles  of  driving   data,  and  adds  another  million  every  10  hours  
  14. 14. Algorithmic  Economy  
  15. 15. Algorithm  Economy  Players  in  ML  
  16. 16. Cloud-­‐Based  Intelligence   Emerging  machine  intelligence   plaoorms  hos)ng  pre-­‐trained  machine   learning  models-­‐as-­‐a-­‐service  are   making  it  easy  for  companies  to  get   started  with  ML,  allowing  them  to   rapidly  take  their  applica)ons  from   prototype  to  produc)on.   Many  open  source  machine  learning  and   deep  learning  frameworks  running  in  the   cloud  allow  easy  leveraging  of  pre-­‐ trained,  hosted  models  to  tag  images,   recommend  products,  and  do  general   natural  language  processing  tasks.  
  17. 17. An  Example  
  18. 18. Apps  for  Excel  
  19. 19. Machine  Learning  Terminology  
  20. 20. Feature  Vectors  in  ML   •  A  machine  learning  system  builds  models  using  proper)es  of  objects  being   modeled.  These  proper)es  are  called    features  or  a@ributes  and  the  process  of   measuring/obtaining  such  proper)es  is  called  feature  extrac&on.  It  is  common  to   represent  the  proper)es  of  objects  as  feature  vectors.   Sepal  width     Sepal  length     Petal  width     Petal  length   x = 2 6 6 4 x1 x2 x3 x4 3 7 7 5  
  21. 21. Learning  Styles   •  Supervised  Learning   –  Training  data  comes  with  answers,  called  labels   –  The  goal  is  to  produce  labels  for  new  data  
  22. 22. Supervised  Learning  Models   •  Classifica)on  models   – Predict  whether  a   customer  is  likely  to   be  lost  to  compe)tor   – Tag  objects  in  a  given   image   – Determine  whether   an  incoming  email  is   spam  or  not  
  23. 23. Supervised  Learning  Models   •  Regression  models   – Predict  credit  card   balance  of  customers   – Predict  the  number  of   'likes'  for  a  pos)ng   – Predict  peak  load  for   a  u)lity  given   weather  informa)on  
  24. 24. Learning  Styles   •  Unsupervised  Learning   –  Training  data  comes  without  labels   –  The  goal  is  to  group  data  into  different  categories  based  on  similari)es   Grouped  Data  
  25. 25. Unsupervised  Learning  Models   •  Segment/  cluster   customers  into   different  groups   •  Organize  a  collec)on   of  documents  based   on  their  content   •  Make   Recommenda)ons   for  products  
  26. 26. Learning  Styles   •  Reinforcement  Learning   –  Training  data  comes  without  labels   –  The  learning  system  receives  feedback  from  its  opera)ng   environment  to  know  how  well  it  is  doing   –  The  goal  is  to  perform  be7er  
  27. 27. Overview  of  Machine  Learning  Methods  
  28. 28. Walk  Through  An  Example:  Flower   Classifica)on   •  Build  a  classifica)on   model  to  differen)ate   between  two  classes  of   flower  
  29. 29. How  Do  We  Go  About  It?   •  Collect  a  large  number  of  both  types  of  flowers  with   the  help  of  an  expert   •  Measure  some  a7ributes  that  can  help  differen)ate   between  the  two  types  of  flowers.  Let  those   a7ributes  be  petal  area  and  sepal  area.  
  30. 30. Sca7er  plot  of  100  examples  of  flowers  
  31. 31. We  can  separate  the  flower  types  using  the  linear  boundary  shown   above.  The  parameters  of  the  line  represent  the  learned  classifica)on   model.  
  32. 32. Another  possible  boundary.  This  boundary  cannot  be  expressed  via  an   equa)on.  However,  a  tree  structure  can  be  used  to  express  this  boundary.   Note,  this  boundary  does  be7er  predic)on  of  the  collected  data  
  33. 33. Yet  another  possible  boundary.  This  boundary  does  predic)on  without  any   error.  Is  this  a  be7er  boundary?  
  34. 34. Model  Complexity   •  There  are  tradeoffs  between    the  complexity  of    models  and     their    performance    in  the  field.  A  good  design  (model  choice)   weighs  these  tradeoffs.   •  A  good  design  should  avoid  overfimng.  How?   –  Divide  the  en)re  data  into  three  sets   •  Training  set  (about  70%  of  the  total  data).  Use  this  set  to  build  the  model   •  Test  set  (about  20%  of  the  total  data).  Use  this  set  to  es)mate  the  model   accuracy  auer  deployment   •  Valida)on  set  (remaining  10%  of  the  total  data).  Use  this  set  to  determine   the  appropriate  semngs  for  free  parameters  of  the  model.  May  not  be   required  in  some  cases.  
  35. 35. Measuring  Model  Performance   •  True  Posi)ve:  Correctly  iden)fied  as  relevant   •  True  Nega)ve:  Correctly  iden)fied  as  not  relevant   •  False  Posi)ve:  Incorrectly  labeled  as  relevant   •  False  Nega)ve:  Incorrectly  labeled  as  not  relevant     Image:   True   Posi)ve   True    Nega)ve   Cat  vs.  No  Cat   False    Nega)ve   False    Posi)ve  
  36. 36. Precision,  Recall,  and  Accuracy   •  Precision   –  Percentage  of  posi)ve  labels  that  are  correct   –  Precision  =  (#  true  posi)ves)  /  (#  true  posi)ves  +  #  false  posi)ves)   •  Recall   –  Percentage  of  posi)ve  examples  that  are  correctly  labeled   –  Recall  =  (#  true  posi)ves)  /  (#  true  posi)ves  +  #  false  nega)ves)   •  Accuracy   –  Percentage  of  correct  labels   –  Accuracy  =  (#  true  posi)ves  +  #  true  nega)ves)  /  (#  of  samples)  
  37. 37. Sum-­‐of-­‐Squares  Error  for  Regression   Models   For  regression  model,  the  error  is  measured  by  taking  the  square  of  the   difference  between  the  predicted  output  value  and  the  target  value  for  each   training  (test)  example  and  adding  this  number  over  all  examples  as  shown  
  38. 38. Bias  and  Variance   •  Bias:  expected  difference  between  model’s   predic)on  and  truth   •  Variance:  how  much  the  model  differs  among   training  sets   •  Model  Scenarios   –  High  Bias:  Model  makes  inaccurate  predic)ons  on  training   data   –  High  Variance:  Model  does  not  generalize  to  new  datasets   –  Low  Bias:  Model  makes  accurate  predic)ons  on  training   data   –  Low  Variance:  Model  generalizes  to  new  datasets  
  39. 39. The  Guiding  Principle  for  Model   Selec)on:  Occam’s  Razor  
  40. 40. Model  Building  Algorithms   •  Supervised  learning  algorithms   – Linear  methods   – k-­‐NN  classifiers   – Neural  networks   – Support  vector  machines   – Decision  trees   – Ensemble  methods  
  41. 41. Illustra)on  of  k-­‐NN  Model   Predicted  label  of  test  example  with  1-­‐NN  model  :  Versicolor   Predicted  label  of  text  example  with  3-­‐NN  model:  Virginica   Test  example  
  42. 42. Illustra)on  of  Decision  Tree  Model   Petal  width  <=  0.8   Setosa   Yes   Petal  length  <=  4.75   Versicolor   Virginica   Yes   No   No   The  decision  tree  is  automa)cally  generated  by  a  machine  learning  algorithm.  
  43. 43. Model  Building  Algorithms   •  Unsupervised  learning   – k-­‐means  clustering   – Agglomera)ve  clustering   – Self  organiza)on  feature  maps   – Recommenda)on  system  
  44. 44. K-­‐means  Clustering   K-m “by far the clusterin nowadays in industrial Choose  the  number  of   clusters,  k,  and  ini)al   cluster  centers  
  45. 45. K-­‐means  Clustering   K-m “by far the clusterin nowadays in industrial K-means clustering 2 K-means clustering 2 K-means clustering 2 Assign  data  points  to   clusters  based  on   distance  to  cluster   centers  
  46. 46. K-­‐means  Clustering   K-m “by far the clusterin nowadays in industrial K-means clustering 2 K-means clustering 2 K-means clustering 2 K-means clustering p (sum of square dis from data points to centers) minimize N n=1 ⇥xn cente 3 Update  cluster  centers   and  reassign  data   points.     K-means K-means clustering problem (sum of square distances from data points to cluster minimize N n=1 ⇥xn centern⇥ 2
  47. 47. Illustra)on  of  Recommenda)on   System  
  48. 48.  
  49. 49. Steps  Towards  a  Machine  Learning  Project   •  Collect  data   •  Explore  data  via  sca7er  plots,  histograms.   Remove  duplicates  and  data  records  with   missing  values   •  Check  for  dimensionality  reduc)on   •  Build  model  (itera)ve  process)   •  Transport/Integrate  with  an  applica)on  
  50. 50. Machine  Learning  to  Deep  Learning  
  51. 51. Machine  Learning  Limita)on   •  Machine  learning  methods  operate  on  manually   designed  features.     •  The  design  of  such  features  for  tasks  involving   computer  vision,  speech  understanding,  natural   language  processing  is  extremely  difficult.  This  puts  a   limit  on  the  performance  of  the  system.   Feature  Extractor   Trainable   Classifier  
  52. 52. Processing  Sensory  Data  is  Hard   How  do  we  bridge  this  gap   between  the  pixels  and   meaning  via  machine   learning?  
  53. 53. Sensory  Data  Processing  is   Challenging   So  why  not  build  integrated  learning  systems  that  perform  end-­‐to-­‐end   learning,  i.e.  learn  the  representa)on  as  well  as  classifica)on  from  raw   data  without  any  engineered  features.   Feature  Learner   Trainable   Classifier   An  approach  performing  end-­‐to-­‐end  learning,  typically  performed  through   a  series  of  successive  abstrac)ons,  is  in  a  nutshell  deep  learning  
  54. 54. SegNet  is  a  deep  learning  architecture  for  pixel  wise  seman)c  segmenta)on   from  the  University  of  Cambridge.   An  example  of  deep  learning  Capability  
  55. 55. Summary   •  We  have  just  skimmed  machine  learning  at  surface   •  Web  is  full  of  reading  resources  (free  books,  lecture   notes,  blogs,  videos)  to  dig  into  machine  learning   •  Several  open  source  souware  resources  (R,  Rapid   Miner,  and  Scikit-­‐learn  etc.)  to  learn  via   experimenta)on   •  Applica)ons  based  on  vision,  speech,  and  natural   language  processing  are  excellent  candidates  for   deep  learning  
  56. 56.   h7ps://