Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CH2019 keynote: Emily Robinson - Data science demystified

99 views

Published on

(video summary at: https://conversionhotel.com/session/keynote-2019-data-science-demystified/)

In the conversion optimization industry most data analytics people are used to work with analytics software and generate insights for and results of optimization and validation efforts. A large group of people at #CH2019 have this job. They have been told that the future of analytics is in Data Science. But when is it exactly data science what they do? Do you need to be a coder to call yourself a data scientist, do you need an official degree or is having some technical implementation skills enough?

I recently bumped into Emily Robinson at a CRO conference in the US and liked her stage appearance. I learned that she is writing the book “Build A Career in Data Science”, with Jacqueline Nolis, to be published by in early 2020. She currently works at DataCamp as a Data Scientist on the growth team, where she built their experimentation analytics system. Previously, she was a Data Scientist at Etsy working with their search team to design, implement, and analyze experiments on the ranking algorithm, UI changes, and new features.

She regularly give talks on A/B testing, R programming, and data science career advice at conferences and meetups. I thought Emily would be the perfect person to demystify Data Science for us and explain how to make the first steps to build a career in that.

Enjoy her talk,

Ton Wesseling

Founder & host of The Conference formerly known as Conversion Hotel

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

CH2019 keynote: Emily Robinson - Data science demystified

  1. 1. Data Science Demystified Emily Robinson
  2. 2. What is Data Science?
  3. 3. A common question
  4. 4. What is data science?
  5. 5. What do data scientists do?
  6. 6. What is Data Science? The reality
  7. 7. https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007 AI needs a lot of support
  8. 8. What is a data scientists?
  9. 9. What do data scientists do? The Reality
  10. 10. What do data scientists do? The Reality
  11. 11. One definition https://hackernoon.com/what-on-earth-is-data-science-eb1237d8cb37, Cassie Kozyrkov Data science is the discipline of making data useful
  12. 12. Classic data science venn diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  13. 13. Our (slightly updated) version
  14. 14. Programming OR
  15. 15. Benefit of programming Accessibility Web APIs SQL Databases Historical Data httr DBI dbply r Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
  16. 16. Code away repetitive tasks Code around obstacles Limit Human Error Benefit of programming Efficiency Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
  17. 17. Benefit of programming Collaboration Increased Shareability Communicable Processes Dependable Replicability Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
  18. 18. Code away repetitive tasks Code around obstacles Limit Human Error Benefits of programming Accessibility Efficiency Collaboration Web APIs SQL Databases Historical Data Increased Shareability Communicable Processes Dependable Replicability httr DBI dbply r Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
  19. 19. Mathematics & statistics 1. What techniques exists • I need to group customers together -> I should try clustering 2. How to apply them • How to do a k-means clustering in R/Python 3. How to choose which to try • What clustering method will work best?
  20. 20. Statistics: going beyond the numbers Which is greater? 4 out of 10 390 out of 1000
  21. 21. Statistics: going beyond the numbers Who are the worst batters? http://varianceexplained.org/r/credible_intervals_baseball/ Who are the best batters?
  22. 22. Statistics: going beyond the numbers http://varianceexplained.org/r/credible_intervals_baseball/
  23. 23. How can we split our customers into different groups to market to? How can we run a clustering algorithm to segment customer data? Business question Data science question A k-means clustering found 3 distinct groups Data science answer Business answer Here are 3 types of customers: new, high spending, commercial Domain knowledge - Renee Teate, @BecomingDataSci Skills: • Communication • Empathy • Understanding your data (where it lives, built-in assumptions, edge cases)
  24. 24. Three sub-categories https://www.linkedin.com/pulse/one-data-science-job-doesnt-fit-all-elena-grewal/
  25. 25. Analytics Pulled from Airbnb Careers • Define and evaluate key metrics • Develop dashboards • Communicate analyses • Comfortable in SQL • Industry experience
  26. 26. Algorithms From Airbnb Careers • Deep Learning techniques • Natural language processing • Strong programming skills • Developing ML models at scale in
  27. 27. Inference From Airbnb Careers • Run strategic analysis • Design experiments • Improve statistical methodology • PhD in quantitative field
  28. 28. Three completely different job descriptions From Airbnb Careers • Deep Learning techniques • Natural language processing • Strong programming skills • Developing ML models at scale in • Define and evaluate key metrics • Develop dashboards • Communicate analyses • Comfortable in SQL • Industry experience • Run strategic analysis • Design experiments • Improve statistical methodology • PhD in quantitative field
  29. 29. How Do I Grow my Data Science Skills?
  30. 30. Become a data scientist in 3 easy steps https://towardsdatascience.com/from-data-analyst-to-data-scientist-f67a724ea265, Ben Stanbury
  31. 31. “Must know” lists
  32. 32. You don’t need to know everything
  33. 33. #1 Advice Practice by making data science personal projects
  34. 34. Why? “Don’t get stressed about keeping up with the cutting edge of the field … You should start by getting very comfortable transforming and visualizing data, programming with a wide variety of packages, and using statistical techniques like hypothesis tests, classification, and regression.” - David Robinson, Data Insights Engineering Manager at Flatiron Health, Chapter 4
  35. 35. How?
  36. 36. Dataset -> Question
  37. 37. Dataset -> Question
  38. 38. Question -> Dataset https://theambitiouseconomist.com/an-analysis-of-the-gender-wage-gap-in-australia/
  39. 39. Tip 1: Include visualizations https://hackernoon.com/more-than-a-million-pro-repeal-net-neutrality-comments-were-likely-faked-e9f0e3ed36a6
  40. 40. Tip 2: choose a topic you’re excited about https://masalmon.eu/2018/01/01/sortinghat/
  41. 41. Tip 3: Limit your scope https://kkulma.github.io/2017-08-13-friendships-among-top-r-twitterers/
  42. 42. Making progress Inspired by bit.ly/drob-rstudio-2019 Less valuable More valuable Idea Getting data Cleaning Exploratory Final resultModeling Less valuable More valuable Work only on your computer Work online (GitHub, Blog, Kaggle) How I used to think about analyses How I think about analyses now
  43. 43. The full process
  44. 44. Put it on GitHub
  45. 45. Conclusion
  46. 46. The potential future of data scientists From https://hbr.org/2018/08/what-data-scientists-really-do-according-to-35-data-scientists “It wouldn’t surprise me if the [data scientist] title goes the way of the ‘webmaster’” - Hilary Mason
  47. 47. Resources • Day in the life of a data scientist webinar by David Robinson • You’re not paid to model by Jacqueline Nolis • Doing data science at Twitter by Robert Chang • Succeeding as a data scientist in small companies/startups by Randy Au • How to change careers and become a data scientist – one quant’s experience by Rachel Thomas • What data scientists really do, according to 35 data scientists by Hugo Bowne- Anderson
  48. 48. Getting started with data science
  49. 49. Thank you! hookedondata.org @robinson_es datascicareer.com 40% off w/ pcrobinson

×