Successfully reported this slideshow.

×

1 of 56 Ad

# Ml intro

Intro to Machine learning for broad audiences

Intro to Machine learning for broad audiences

## More Related Content

### Ml intro

1. 1. Introduc)on  to  Machine   Learning   Integrated  Knowledge  Solu)ons   h7ps://iksinc.wordpress.com/home/   iksinc@yahoo.com   sikrishan@gmail.com
2. 2. Agenda   •  What  is  machine  learning?   •  Why  machine  learning  and  why  now?   •  Machine  learning  terminology   •  Overview  of  machine  learning  methods   •  Machine  learning  to  deep  learning   •  Summary  and  Q  &  A   iksinc@yahoo.com
3. 3. What  is  machine  learning?   iksinc@yahoo.com
4. 4. What  is  Machine  Learning?   •  Machine  learning  deals  with  making  computers  learn   to  make  predic)ons/decisions  without  explicitly   programming  them.  Rather  a  large  number  of   examples  of  the  underlying  task  are  shown  to   op)mize  a  performance  criterion  to  achieve  learning.   iksinc@yahoo.com
5. 5. An  Example  of  Machine  Learning:  Credit   Default  Predic)on   We  have  historical  data  about  businesses  and  their  delinquency.  The  data  consists  of   100  businesses.  Each  business  is  characterized  via  two  a7ributes:  business  age  in   months  and  number  of  days  delinquent  in  payment.  We  also  know  whether  a  business   defaulted  or  not.  Using  machine  learning,  we  can  build  a  model  to  predict  the   probability  whether  a  given  business  will  default  or  not.     0   20   40   60   80   100   0   100   200   300   400   500   iksinc@yahoo.com
6. 6. Logis)c  Regression   •  The  model  that  is  used  here  is  called  the  logis&c  regression   model.    Lets  look  at  the  following  expression                ,  where  x1,  x2,…,  xk  are  the  a7ributes.     •  In  our  example,  the  a7ributes  are  business  age  and  number   of  days  of  delinquency.   •  The  quan)ty  p  will  always  lie  in  the  range  0-­‐1  and  thus  can  be   interpreted  as  the  probability  of  outcome  being  default  or  no   default.     p = e(a0+a1x1...+ak xk ) 1+e(a0+a1x1...+ak xk ) iksinc@yahoo.com
7. 7. Logis)c  Regression   •  By  simple  rewri)ng,  we  get:    log(p/(1-­‐p))  =  a0  +  a1x1  +  a2x2  +·∙·∙·∙  +  akxk     •  This  ra)o  is  called  log  odds   •  The  parameters  of  the  logis)c  model,  a0  ,  a1,…,  ak,     are  learned  via  an  op)miza)on  procedure   •  The  learned  parameters  can  then  be  deployed  in  the   ﬁeld  to  make  predic)ons   iksinc@yahoo.com
8. 8. 0   0.2   0.4   0.6   0.8   1   1.2   1   5   9   13   17   21   25   29   33   37   41   45   49   53   57   61   65   69   73   77   81   85   89   93   97   Only  in  rare  cases,  we  get  a   100%  accurate  model.   Model  Details  and  Performance   Plot  of  predicted  default   probability   iksinc@yahoo.com
9. 9. Using  the  Model   •  What  is  the  probability  of  a  business   defaul)ng  given  that  business  has  been  with   the  bank  for  26  months  and  is  delinquent  for   58  days?               e0.008*26+0.102*58-­‐5.706/ (1+e0.008*26+0.102*58-­‐5.706)   0.603   Plug  the  model   parameters  to   calculate  p   BUSAGE:  0.008;  DAYSDELQ:  0.102;  Intercept:  -­‐5.076   iksinc@yahoo.com
10. 10. Why  Machine  Learning  and  Why  Now?   iksinc@yahoo.com
11. 11. Why  Machine  Learning?   iksinc@yahoo.com
12. 12. Buzz  about  Machine  Learning   "Every  company  is  now  a  data  company,   capable  of  using  machine  learning  in  the  cloud   to  deploy  intelligent  apps  at  scale,  thanks  to   three  machine  learning  trends:  data  ﬂywheels,   the  algorithm  economy,  and  cloud-­‐hosted   intelligence."   Three  factors  are  making  machine  learning  hot.  These  are  cheap  data,   algorithmic  economy,  and  cloud-­‐based  solu)ons.   iksinc@yahoo.com
13. 13. Data  is  gemng  cheaper   For  example,  Tesla  has  780  million  miles  of  driving   data,  and  adds  another  million  every  10  hours  iksinc@yahoo.com
14. 14. Algorithmic  Economy   iksinc@yahoo.com
15. 15. Algorithm  Economy  Players  in  ML   iksinc@yahoo.com
16. 16. Cloud-­‐Based  Intelligence   Emerging  machine  intelligence   plaoorms  hos)ng  pre-­‐trained  machine   learning  models-­‐as-­‐a-­‐service  are   making  it  easy  for  companies  to  get   started  with  ML,  allowing  them  to   rapidly  take  their  applica)ons  from   prototype  to  produc)on.   Many  open  source  machine  learning  and   deep  learning  frameworks  running  in  the   cloud  allow  easy  leveraging  of  pre-­‐ trained,  hosted  models  to  tag  images,   recommend  products,  and  do  general   natural  language  processing  tasks.   iksinc@yahoo.com
17. 17. An  Example   iksinc@yahoo.com
18. 18. Apps  for  Excel     iksinc@yahoo.com
19. 19. Machine  Learning  Terminology   iksinc@yahoo.com
20. 20. Feature  Vectors  in  ML   •  A  machine  learning  system  builds  models  using  proper)es  of  objects  being   modeled.  These  proper)es  are  called    features  or  a@ributes  and  the  process  of   measuring/obtaining  such  proper)es  is  called  feature  extrac&on.  It  is  common  to   represent  the  proper)es  of  objects  as  feature  vectors.   Sepal  width     Sepal  length     Petal  width     Petal  length   x = 2 6 6 4 x1 x2 x3 x4 3 7 7 5 iksinc@yahoo.com
21. 21. Learning  Styles   •  Supervised  Learning   –  Training  data  comes  with  answers,  called  labels   –  The  goal  is  to  produce  labels  for  new  data   iksinc@yahoo.com
22. 22. Supervised  Learning  Models   •  Classiﬁca)on  models   – Predict  whether  a   customer  is  likely  to   be  lost  to  compe)tor   – Tag  objects  in  a  given   image   – Determine  whether   an  incoming  email  is   spam  or  not   iksinc@yahoo.com
23. 23. Supervised  Learning  Models   •  Regression  models   – Predict  credit  card   balance  of  customers   – Predict  the  number  of   'likes'  for  a  pos)ng   – Predict  peak  load  for   a  u)lity  given   weather  informa)on   iksinc@yahoo.com
24. 24. Learning  Styles   •  Unsupervised  Learning   –  Training  data  comes  without  labels   –  The  goal  is  to  group  data  into  diﬀerent  categories  based  on  similari)es   Grouped  Data   iksinc@yahoo.com
25. 25. Unsupervised  Learning  Models   •  Segment/  cluster   customers  into   diﬀerent  groups   •  Organize  a  collec)on   of  documents  based   on  their  content   •  Make   Recommenda)ons   for  products   iksinc@yahoo.com
26. 26. Learning  Styles   •  Reinforcement  Learning   –  Training  data  comes  without  labels   –  The  learning  system  receives  feedback  from  its  opera)ng   environment  to  know  how  well  it  is  doing   –  The  goal  is  to  perform  be7er   iksinc@yahoo.com
27. 27. Overview  of  Machine  Learning  Methods   iksinc@yahoo.com
28. 28. Walk  Through  An  Example:  Flower   Classiﬁca)on   •  Build  a  classiﬁca)on   model  to  diﬀeren)ate   between  two  classes  of   ﬂower   iksinc@yahoo.com
29. 29. How  Do  We  Go  About  It?   •  Collect  a  large  number  of  both  types  of  ﬂowers  with   the  help  of  an  expert   •  Measure  some  a7ributes  that  can  help  diﬀeren)ate   between  the  two  types  of  ﬂowers.  Let  those   a7ributes  be  petal  area  and  sepal  area.     iksinc@yahoo.com
30. 30. Sca7er  plot  of  100  examples  of  ﬂowers   iksinc@yahoo.com
31. 31. We  can  separate  the  ﬂower  types  using  the  linear  boundary  shown   above.  The  parameters  of  the  line  represent  the  learned  classiﬁca)on   model.   iksinc@yahoo.com
32. 32. Another  possible  boundary.  This  boundary  cannot  be  expressed  via  an   equa)on.  However,  a  tree  structure  can  be  used  to  express  this  boundary.   Note,  this  boundary  does  be7er  predic)on  of  the  collected  data  iksinc@yahoo.com
33. 33. Yet  another  possible  boundary.  This  boundary  does  predic)on  without  any   error.  Is  this  a  be7er  boundary?   iksinc@yahoo.com
34. 34. Model  Complexity   •  There  are  tradeoﬀs  between    the  complexity  of    models  and     their    performance    in  the  ﬁeld.  A  good  design  (model  choice)   weighs  these  tradeoﬀs.   •  A  good  design  should  avoid  overﬁmng.  How?   –  Divide  the  en)re  data  into  three  sets   •  Training  set  (about  70%  of  the  total  data).  Use  this  set  to  build  the  model   •  Test  set  (about  20%  of  the  total  data).  Use  this  set  to  es)mate  the  model   accuracy  auer  deployment   •  Valida)on  set  (remaining  10%  of  the  total  data).  Use  this  set  to  determine   the  appropriate  semngs  for  free  parameters  of  the  model.  May  not  be   required  in  some  cases.     iksinc@yahoo.com
35. 35. Measuring  Model  Performance   •  True  Posi)ve:  Correctly  iden)ﬁed  as  relevant   •  True  Nega)ve:  Correctly  iden)ﬁed  as  not  relevant   •  False  Posi)ve:  Incorrectly  labeled  as  relevant   •  False  Nega)ve:  Incorrectly  labeled  as  not  relevant     Image:   True   Posi)ve   True    Nega)ve   Cat  vs.  No  Cat   False    Nega)ve   False    Posi)ve   iksinc@yahoo.com
36. 36. Precision,  Recall,  and  Accuracy   •  Precision   –  Percentage  of  posi)ve  labels  that  are  correct   –  Precision  =  (#  true  posi)ves)  /  (#  true  posi)ves  +  #  false  posi)ves)   •  Recall   –  Percentage  of  posi)ve  examples  that  are  correctly  labeled   –  Recall  =  (#  true  posi)ves)  /  (#  true  posi)ves  +  #  false  nega)ves)   •  Accuracy   –  Percentage  of  correct  labels   –  Accuracy  =  (#  true  posi)ves  +  #  true  nega)ves)  /  (#  of  samples)   iksinc@yahoo.com
37. 37. Sum-­‐of-­‐Squares  Error  for  Regression   Models   For  regression  model,  the  error  is  measured  by  taking  the  square  of  the   diﬀerence  between  the  predicted  output  value  and  the  target  value  for  each   training  (test)  example  and  adding  this  number  over  all  examples  as  shown   iksinc@yahoo.com
38. 38. Bias  and  Variance   •  Bias:  expected  diﬀerence  between  model’s   predic)on  and  truth   •  Variance:  how  much  the  model  diﬀers  among   training  sets   •  Model  Scenarios   –  High  Bias:  Model  makes  inaccurate  predic)ons  on  training   data   –  High  Variance:  Model  does  not  generalize  to  new  datasets   –  Low  Bias:  Model  makes  accurate  predic)ons  on  training   data   –  Low  Variance:  Model  generalizes  to  new  datasets   iksinc@yahoo.com
39. 39. The  Guiding  Principle  for  Model   Selec)on:  Occam’s  Razor   iksinc@yahoo.com
40. 40. Model  Building  Algorithms   •  Supervised  learning  algorithms   – Linear  methods   – k-­‐NN  classiﬁers   – Neural  networks   – Support  vector  machines   – Decision  trees   – Ensemble  methods   iksinc@yahoo.com
41. 41. Illustra)on  of  k-­‐NN  Model   Predicted  label  of  test  example  with  1-­‐NN  model  :  Versicolor   Predicted  label  of  text  example  with  3-­‐NN  model:  Virginica   Test  example   iksinc@yahoo.com
42. 42. Illustra)on  of  Decision  Tree  Model   Petal  width  <=  0.8   Setosa   Yes   Petal  length  <=  4.75   Versicolor   Virginica   Yes   No   No   The  decision  tree  is  automa)cally  generated  by  a  machine  learning  algorithm.   iksinc@yahoo.com
43. 43. Model  Building  Algorithms   •  Unsupervised  learning   – k-­‐means  clustering   – Agglomera)ve  clustering   – Self  organiza)on  feature  maps   – Recommenda)on  system   iksinc@yahoo.com
44. 44. K-­‐means  Clustering   K-m “by far the clusterin nowadays in industrial Choose  the  number  of   clusters,  k,  and  ini)al   cluster  centers
45. 45. K-­‐means  Clustering   K-m “by far the clusterin nowadays in industrial K-means clustering 2 K-means clustering 2 K-means clustering 2 Assign  data  points  to   clusters  based  on   distance  to  cluster   centers
46. 46. K-­‐means  Clustering   K-m “by far the clusterin nowadays in industrial K-means clustering 2 K-means clustering 2 K-means clustering 2 K-means clustering p (sum of square dis from data points to centers) minimize N n=1 ⇥xn cente 3 Update  cluster  centers   and  reassign  data   points.     K-means K-means clustering problem (sum of square distances from data points to cluster minimize N n=1 ⇥xn centern⇥ 2
47. 47. Illustra)on  of  Recommenda)on   System   iksinc@yahoo.com
48. 48. iksinc@yahoo.com
49. 49. Steps  Towards  a  Machine  Learning  Project   •  Collect  data   •  Explore  data  via  sca7er  plots,  histograms.   Remove  duplicates  and  data  records  with   missing  values   •  Check  for  dimensionality  reduc)on   •  Build  model  (itera)ve  process)   •  Transport/Integrate  with  an  applica)on   iksinc@yahoo.com
50. 50. Machine  Learning  to  Deep  Learning   iksinc@yahoo.com
51. 51. Machine  Learning  Limita)on   •  Machine  learning  methods  operate  on  manually   designed  features.     •  The  design  of  such  features  for  tasks  involving   computer  vision,  speech  understanding,  natural   language  processing  is  extremely  diﬃcult.  This  puts  a   limit  on  the  performance  of  the  system.   iksinc@yahoo.com   Feature  Extractor   Trainable   Classiﬁer
52. 52. Processing  Sensory  Data  is  Hard   How  do  we  bridge  this  gap   between  the  pixels  and   meaning  via  machine   learning?
53. 53. Sensory  Data  Processing  is   Challenging   So  why  not  build  integrated  learning  systems  that  perform  end-­‐to-­‐end   learning,  i.e.  learn  the  representa)on  as  well  as  classiﬁca)on  from  raw   data  without  any  engineered  features.   Feature  Learner   Trainable   Classiﬁer   An  approach  performing  end-­‐to-­‐end  learning,  typically  performed  through   a  series  of  successive  abstrac)ons,  is  in  a  nutshell  deep  learning
54. 54. SegNet  is  a  deep  learning  architecture  for  pixel  wise  seman)c  segmenta)on   from  the  University  of  Cambridge.   An  example  of  deep  learning  Capability
55. 55. Summary   •  We  have  just  skimmed  machine  learning  at  surface   •  Web  is  full  of  reading  resources  (free  books,  lecture   notes,  blogs,  videos)  to  dig  into  machine  learning   •  Several  open  source  souware  resources  (R,  Rapid   Miner,  and  Scikit-­‐learn  etc.)  to  learn  via   experimenta)on   •  Applica)ons  based  on  vision,  speech,  and  natural   language  processing  are  excellent  candidates  for   deep  learning   iksinc@yahoo.com
56. 56. isethi@oakland.edu   h7ps://iksinc.wordpress.com/home/   iksinc@yahoo.com  iksinc@yahoo.com