Demystifying Data Science with an introduction to Machine Learning
Upcoming SlideShare
Loading in...5
×
 

Demystifying Data Science with an introduction to Machine Learning

on

  • 137 views

Demystifying data science is the slide deck to accompany @brightsparc presentation to SEEK.

Demystifying data science is the slide deck to accompany @brightsparc presentation to SEEK.

Statistics

Views

Total Views
137
Views on SlideShare
137
Embed Views
0

Actions

Likes
0
Downloads
14
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Demystifying Data Science with an introduction to Machine Learning Demystifying Data Science with an introduction to Machine Learning Presentation Transcript

  • Demys&fying  Data  Science   with  and  Intro  to  Machine  Learning  
  • Data  science  is  everywhere  
  • Sexiest  job  in  21st  century*     McKinsey  Global  Ins&tute  report  es&mates  that  by   2018,  “the  United  States  alone  could  face  a   shortage  of  140,000  to  190,000  people  with  deep   analy&cal  skills  as  well  as  1.5  million  managers  and   analysts  with  the  know-­‐how  to  use  the  analysis  of   big  data  to  make  effec&ve  decisions”   Source:  Harvard  business  Review  Oct’  2012    
  • So  what  is  Data  Science?  
  • Source:  Hilary  Mason  ex-­‐Chief  data  science  bit.ly    
  • Who  are  these  unicorns?  
  • Bit  about  me   @brightsparc  
  • I  thought  it  was  all  about  stats?  
  • It’s  a  broader  skillset   Source:  h[p://blogs.wsj.com/cio/2014/02/14/it-­‐takes-­‐teams-­‐to-­‐solve-­‐the-­‐data-­‐scien&st-­‐shortage/  
  • Data  science  pipeline   Source:  h[p://cacm.acm.org/blogs/blog-­‐cacm/169199-­‐data-­‐science-­‐workflow-­‐overview-­‐and-­‐challenges/fulltext  
  • Where  does  Kaggle  fit  it?       Degree  breakdown  in  top  100   Areas  of  study  
  • What’s  the  deal  with  big  data?  
  • Apache  Hadoop  Ecosystem  
  • It’s  like  Map  Reduce  you  know  
  • So  what  about  machine  learning?   Pioneer  in  machine  learning,  created  a  checkers  game  that  played  itself   “Give  machines  the  ability   to  learn  without  explicitly   programming  them.”   Arthur  L.  Samuel  (1959)  
  • Types  of  algorithms  
  • Some  examples  
  • Machine  learning  process  
  • Build  a  model   Underfit   Overfit   Linear  Regression   Solve  for  values  of  θ  in  the  Hypothesis  func&on    hθ(x)  
  • Gradient  descent  algorithm   Minimize  cost  func&on  which  is  ½  of  average   square  error  of  predic&on  vs.  the  training  data.  
  • Demo:  House  prices  
  • Cross  valida&on  –  split  training/test  
  • Supervised  learning  model  
  • Recommender  systems   Collabora&ve  filtering  –  predict  ra&ngs  for  similar  items  given  other  users  behavior  
  • Collabora&ve  filtering  method   Source:  h[p://cran.r-­‐project.org/web/packages/recommenderlab/vigne[es/recommenderlab.pdf  
  • Similar  users  based  on  distance   Manha[an  distance   Euclidian  distance  
  • Demo:  Music  recommender  system   Pearson  Correla&on  Coefficient    
  • Visualiza&on  frameworks   Tableau   D3.js   Processing   Raphaël.js  
  • What  about  online  experimenta&on?  
  • What  will  the  future  look  like   •  Online  collabora&on   •  Open  Data  
  • Next  gen  distributed  compu&ng   100x  faster  in  memory,  and  10x  faster  even  when  running  on  disk.  
  • Deep  learning,  a  new  fron&er?   Geoffrey  Hinton  @Google  
  • How  can  I  get  started?   •  MOOCs   –  Coursera  Machine  Learning     (Andrew  Ng  -­‐  Stanford)   –  Learning  from  Data   (Abu-­‐Mostafa  -­‐  Caltech)   •  Other  references   –  Collec&ve  Intelligence   –  Mining  of  massive  data  sets   –  Open-­‐Source  Data  Science  Masters   •  Frameworks   –  Python  –  Scikit  learn   –  Java  –  WEKA  and  Cascading  
  • Ques&ons