Demystifying Data Science with an introduction to Machine Learning

936 views

Published on

Demystifying data science is the slide deck to accompany @brightsparc presentation to SEEK.

Published in: Internet, Technology, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
936
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
38
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Demystifying Data Science with an introduction to Machine Learning

  1. 1. Demys&fying  Data  Science   with  and  Intro  to  Machine  Learning  
  2. 2. Data  science  is  everywhere  
  3. 3. Sexiest  job  in  21st  century*     McKinsey  Global  Ins&tute  report  es&mates  that  by   2018,  “the  United  States  alone  could  face  a   shortage  of  140,000  to  190,000  people  with  deep   analy&cal  skills  as  well  as  1.5  million  managers  and   analysts  with  the  know-­‐how  to  use  the  analysis  of   big  data  to  make  effec&ve  decisions”   Source:  Harvard  business  Review  Oct’  2012    
  4. 4. So  what  is  Data  Science?  
  5. 5. Source:  Hilary  Mason  ex-­‐Chief  data  science  bit.ly    
  6. 6. Who  are  these  unicorns?  
  7. 7. Bit  about  me   @brightsparc  
  8. 8. I  thought  it  was  all  about  stats?  
  9. 9. It’s  a  broader  skillset   Source:  h[p://blogs.wsj.com/cio/2014/02/14/it-­‐takes-­‐teams-­‐to-­‐solve-­‐the-­‐data-­‐scien&st-­‐shortage/  
  10. 10. Data  science  pipeline   Source:  h[p://cacm.acm.org/blogs/blog-­‐cacm/169199-­‐data-­‐science-­‐workflow-­‐overview-­‐and-­‐challenges/fulltext  
  11. 11. Where  does  Kaggle  fit  it?       Degree  breakdown  in  top  100   Areas  of  study  
  12. 12. What’s  the  deal  with  big  data?  
  13. 13. Apache  Hadoop  Ecosystem  
  14. 14. It’s  like  Map  Reduce  you  know  
  15. 15. So  what  about  machine  learning?   Pioneer  in  machine  learning,  created  a  checkers  game  that  played  itself   “Give  machines  the  ability   to  learn  without  explicitly   programming  them.”   Arthur  L.  Samuel  (1959)  
  16. 16. Types  of  algorithms  
  17. 17. Some  examples  
  18. 18. Machine  learning  process  
  19. 19. Build  a  model   Underfit   Overfit   Linear  Regression   Solve  for  values  of  θ  in  the  Hypothesis  func&on    hθ(x)  
  20. 20. Gradient  descent  algorithm   Minimize  cost  func&on  which  is  ½  of  average   square  error  of  predic&on  vs.  the  training  data.  
  21. 21. Demo:  House  prices  
  22. 22. Cross  valida&on  –  split  training/test  
  23. 23. Supervised  learning  model  
  24. 24. Recommender  systems   Collabora&ve  filtering  –  predict  ra&ngs  for  similar  items  given  other  users  behavior  
  25. 25. Collabora&ve  filtering  method   Source:  h[p://cran.r-­‐project.org/web/packages/recommenderlab/vigne[es/recommenderlab.pdf  
  26. 26. Similar  users  based  on  distance   Manha[an  distance   Euclidian  distance  
  27. 27. Demo:  Music  recommender  system   Pearson  Correla&on  Coefficient    
  28. 28. Visualiza&on  frameworks   Tableau   D3.js   Processing   Raphaël.js  
  29. 29. What  about  online  experimenta&on?  
  30. 30. What  will  the  future  look  like   •  Online  collabora&on   •  Open  Data  
  31. 31. Next  gen  distributed  compu&ng   100x  faster  in  memory,  and  10x  faster  even  when  running  on  disk.  
  32. 32. Deep  learning,  a  new  fron&er?   Geoffrey  Hinton  @Google  
  33. 33. How  can  I  get  started?   •  MOOCs   –  Coursera  Machine  Learning     (Andrew  Ng  -­‐  Stanford)   –  Learning  from  Data   (Abu-­‐Mostafa  -­‐  Caltech)   •  Other  references   –  Collec&ve  Intelligence   –  Mining  of  massive  data  sets   –  Open-­‐Source  Data  Science  Masters   •  Frameworks   –  Python  –  Scikit  learn   –  Java  –  WEKA  and  Cascading  
  34. 34. Ques&ons  

×