Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lies, damned lies and the data scientist 2011 strata summit

3,398 views

Published on

When it comes to big data insights, how do you know you’re asking the right questions? Hiring data scientists is a good start – we’re seeing their growth both on LinkedIn and at LinkedIn. But even data scientists are not immune from the myriad of hidden pitfalls that keep your key insights out of sight.

Drawing from a deceptively simple exercise that I’ve used to haze dozens of data scientists on their first day, I will discuss the good, the bad and the ugly lessons we’ve learned about asking the right questions, denominators and being a data skeptic.

Published in: Technology, Business

Lies, damned lies and the data scientist 2011 strata summit

  1. 1. @mrogati<br />
  2. 2. hottest industries<br />The Mission:<br />
  3. 3. + <br />date <br />joined <br />LinkedIn<br />The data<br />
  4. 4. hottest industries<br />Hotness (X) = <br /> Year-over-year growth of <br /> people in industry X <br /> on LinkedIn <br />The Question<br />
  5. 5. hottest industries<br />Hotness (X) = <br /> Year-over-year growth of <br /> people in industry X <br /> on LinkedIn <br />The Question<br />
  6. 6. The data<br />
  7. 7. hottest industries<br />Hotness (X) = <br /> Year-over-year growth of <br />people job starters <br /> in industry X <br /> on LinkedIn <br />The Question<br />
  8. 8. Externa-lies<br />
  9. 9. Externa-lies<br />
  10. 10. Externa-lies<br />
  11. 11. hottest industries<br />Hotness (X) = <br />part year-over-part year growth of <br />net job starters <br /> in a big enough industry X <br /> on LinkedIn <br />The Question<br />
  12. 12. Dirty data, dirty lies<br />
  13. 13. # profiles<br /># jobs on LinkedIn profile *<br />Dirty data, dirty lies<br />* hypothetical data<br />
  14. 14. Check<br />flags, <br />categories,<br />dates,<br />…<br />Dirty data, dirty lies<br />
  15. 15. Norma-lies<br />
  16. 16. Hotness (X) = <br />part year-over-part year growth of <br />normalizednet job starters, <br /> minus noise, <br /> in a big enough industry X <br /> on LinkedIn <br />hottest industries<br />The Question<br />
  17. 17. Norma-lies<br />
  18. 18. Internet<br />Real Estate<br />Financial Services<br />Truth by omission<br />
  19. 19. … and the data scientist<br />
  20. 20. @mrogati<br />

×