Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Science Demystified

157 views

Published on

Slides for a talk given at "The Conference Formerly Known as Conversion Hotel" in November 2019. Covers what data science is, what data scientists do, and how you can start learning data science skills.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Data Science Demystified

  1. 1. Data Science Demystified Emily Robinson
  2. 2. What is Data Science?
  3. 3. A common question
  4. 4. What is data science?
  5. 5. What do data scientists do?
  6. 6. What is Data Science? The reality
  7. 7. https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007 AI needs a lot of support
  8. 8. What is a data scientists?
  9. 9. What do data scientists do? The Reality
  10. 10. What do data scientists do? The Reality
  11. 11. One definition https://hackernoon.com/what-on-earth-is-data-science-eb1237d8cb37, Cassie Kozyrkov Data science is the discipline of making data useful
  12. 12. Classic data science venn diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  13. 13. Our (slightly updated) version
  14. 14. Programming OR
  15. 15. Benefit of programming Accessibility Web APIs SQL Databases Historical Data httr DBI dbply r Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
  16. 16. Code away repetitive tasks Code around obstacles Limit Human Error Benefit of programming Efficiency Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
  17. 17. Benefit of programming Collaboration Increased Shareability Communicable Processes Dependable Replicability Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
  18. 18. Code away repetitive tasks Code around obstacles Limit Human Error Benefits of programming Accessibility Efficiency Collaboration Web APIs SQL Databases Historical Data Increased Shareability Communicable Processes Dependable Replicability httr DBI dbply r Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
  19. 19. Mathematics & statistics 1. What techniques exists • I need to group customers together -> I should try clustering 2. How to apply them • How to do a k-means clustering in R/Python 3. How to choose which to try • What clustering method will work best?
  20. 20. Statistics: going beyond the numbers Which is greater? 4 out of 10 390 out of 1000
  21. 21. Statistics: going beyond the numbers Who are the worst batters? http://varianceexplained.org/r/credible_intervals_baseball/ Who are the best batters?
  22. 22. Statistics: going beyond the numbers http://varianceexplained.org/r/credible_intervals_baseball/
  23. 23. How can we split our customers into different groups to market to? How can we run a clustering algorithm to segment customer data? Business question Data science question A k-means clustering found 3 distinct groups Data science answer Business answer Here are 3 types of customers: new, high spending, commercial Domain knowledge - Renee Teate, @BecomingDataSci Skills: • Communication • Empathy • Understanding your data (where it lives, built-in assumptions, edge cases)
  24. 24. Three sub-categories https://www.linkedin.com/pulse/one-data-science-job-doesnt-fit-all-elena-grewal/
  25. 25. Analytics Pulled from Airbnb Careers • Define and evaluate key metrics • Develop dashboards • Communicate analyses • Comfortable in SQL • Industry experience
  26. 26. Algorithms From Airbnb Careers • Deep Learning techniques • Natural language processing • Strong programming skills • Developing ML models at scale in
  27. 27. Inference From Airbnb Careers • Run strategic analysis • Design experiments • Improve statistical methodology • PhD in quantitative field
  28. 28. Three completely different job descriptions From Airbnb Careers • Deep Learning techniques • Natural language processing • Strong programming skills • Developing ML models at scale in • Define and evaluate key metrics • Develop dashboards • Communicate analyses • Comfortable in SQL • Industry experience • Run strategic analysis • Design experiments • Improve statistical methodology • PhD in quantitative field
  29. 29. How Do I Grow my Data Science Skills?
  30. 30. Become a data scientist in 3 easy steps https://towardsdatascience.com/from-data-analyst-to-data-scientist-f67a724ea265, Ben Stanbury
  31. 31. “Must know” lists
  32. 32. You don’t need to know everything
  33. 33. #1 Advice Practice by making data science personal projects
  34. 34. Why? “Don’t get stressed about keeping up with the cutting edge of the field … You should start by getting very comfortable transforming and visualizing data, programming with a wide variety of packages, and using statistical techniques like hypothesis tests, classification, and regression.” - David Robinson, Data Insights Engineering Manager at Flatiron Health, Chapter 4
  35. 35. How?
  36. 36. Dataset -> Question
  37. 37. Dataset -> Question
  38. 38. Question -> Dataset https://theambitiouseconomist.com/an-analysis-of-the-gender-wage-gap-in-australia/
  39. 39. Tip 1: Include visualizations https://hackernoon.com/more-than-a-million-pro-repeal-net-neutrality-comments-were-likely-faked-e9f0e3ed36a6
  40. 40. Tip 2: choose a topic you’re excited about https://masalmon.eu/2018/01/01/sortinghat/
  41. 41. Tip 3: Limit your scope https://kkulma.github.io/2017-08-13-friendships-among-top-r-twitterers/
  42. 42. Making progress Inspired by bit.ly/drob-rstudio-2019 Less valuable More valuable Idea Getting data Cleaning Exploratory Final resultModeling Less valuable More valuable Work only on your computer Work online (GitHub, Blog, Kaggle) How I used to think about analyses How I think about analyses now
  43. 43. The full process
  44. 44. Put it on GitHub
  45. 45. Conclusion
  46. 46. The potential future of data scientists From https://hbr.org/2018/08/what-data-scientists-really-do-according-to-35-data-scientists “It wouldn’t surprise me if the [data scientist] title goes the way of the ‘webmaster’” - Hilary Mason
  47. 47. Resources • Day in the life of a data scientist webinar by David Robinson • You’re not paid to model by Jacqueline Nolis • Doing data science at Twitter by Robert Chang • Succeeding as a data scientist in small companies/startups by Randy Au • How to change careers and become a data scientist – one quant’s experience by Rachel Thomas • What data scientists really do, according to 35 data scientists by Hugo Bowne- Anderson
  48. 48. Getting started with data science
  49. 49. Thank you! hookedondata.org @robinson_es datascicareer.com 40% off w/ pcrobinson

×