Successfully reported this slideshow.
Your SlideShare is downloading. ×

Data Science Lessons I have learned in 5 years - Boxun Zhang, GoEuro

Ad

What I Learned About Data
Science in the Past 5 Years
Boxun Zhang

Ad

About me
• Currently leading data science at GoEuro

• 5 years at Spotify & a brief time at SoundCloud

• Ph.D. in Compute...

Ad

1. Reproducibility is everything

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 55 Ad
1 of 55 Ad

Data Science Lessons I have learned in 5 years - Boxun Zhang, GoEuro

Download to read offline

Since 2013 I have been working as Data Scientist – one of today’s hottest jobs in IT industry. During this time, I got the opportunities to experience the evolution of data science landscape — to see what worked and what didn’t.

In this presentation, I will present some of my best learnings in the past 5 years, like foundations for building data science team, efficient ways for data scientists to work with other teams, skills that data scientists should have, and common fallacies in data science work

Since 2013 I have been working as Data Scientist – one of today’s hottest jobs in IT industry. During this time, I got the opportunities to experience the evolution of data science landscape — to see what worked and what didn’t.

In this presentation, I will present some of my best learnings in the past 5 years, like foundations for building data science team, efficient ways for data scientists to work with other teams, skills that data scientists should have, and common fallacies in data science work

Advertisement
Advertisement

More Related Content

More from Evention (20)

Advertisement

Data Science Lessons I have learned in 5 years - Boxun Zhang, GoEuro

  1. 1. What I Learned About Data Science in the Past 5 Years Boxun Zhang
  2. 2. About me • Currently leading data science at GoEuro • 5 years at Spotify & a brief time at SoundCloud • Ph.D. in Computer Science, on “How do people use BitTorrent”
  3. 3. 1. Reproducibility is everything
  4. 4. –Karl Popper “Non-reproducible single occurrences are of no significance to science.”
  5. 5. non-reproducible work = lost knowledge
  6. 6. non-reproducible work = little value
  7. 7. How should reproducible work look like?
  8. 8. How should reproducible work look like? Someone can understand the work and produce the same results only using documentation, report, code, and data, except the help from the original author
  9. 9. Code for the maintainer
  10. 10. How to get there?
  11. 11. Code like engineers How to get there?
  12. 12. Code like engineers Version control data How to get there?
  13. 13. Code like engineers Version control data Review insights How to get there?
  14. 14. 2. Understand the basics well
  15. 15. One common challenge
  16. 16. One common challenge How to learn data science stuff effectively?
  17. 17. One common challenge How to learn data science stuff effectively? machine learning
  18. 18. One common challenge How to learn data science stuff effectively? machine learning data mining
  19. 19. One common challenge How to learn data science stuff effectively? machine learning data mining graph mining
  20. 20. One common challenge How to learn data science stuff effectively? machine learning data mining graph mining information retrieval
  21. 21. Statistics
  22. 22. Statistics Linear Algebra
  23. 23. Statistics Linear Algebra Data structure & Algorithm
  24. 24. What about cutting edge stuff?
  25. 25. Research papers
  26. 26. Research papers Student Internship
  27. 27. 3. Machine learning is the least difficult part
  28. 28. XKCD: Machine Learning
  29. 29. Most models and algorithms are commodity today
  30. 30. Training a model or developing an algorithm is straightforward
  31. 31. Training a model or developing an algorithm is straightforward Once a problem is well defined
  32. 32. The hard parts
  33. 33. Identifying new problems from business
  34. 34. Translating machine insights back to business
  35. 35. Image from http://simplemindinc.com
  36. 36. Image from http://simplemindinc.com complex models -> a few bullet points
  37. 37. 4. The best interview question
  38. 38. How to discover candidate’s true potential
  39. 39. What candidate knows What interviewers might ask
  40. 40. “Please introduce one of your previous projects.”
  41. 41. “Please introduce one of your previous projects.” In other words, teach us something
  42. 42. Not specific
  43. 43. Not specific Candidate is the expert
  44. 44. Not specific Candidate is the expert Easy to ask questions
  45. 45. Not specific Candidate is the expert Easy to ask questions Always learn something
  46. 46. 5. The unwritten role
  47. 47. Data scientists often start with predefined roles
  48. 48. Data scientists often start with predefined roles and work on given tasks from stakeholders
  49. 49. The role of data scientist is still kinda new and unique
  50. 50. The role of data scientist is still kinda new and unique Access to vast amount of data
  51. 51. The role of data scientist is still kinda new and unique Access to vast amount of data Powerful tools for mining insights
  52. 52. Data scientists can also be explorers
  53. 53. Data scientists can also be explorers To explore the unknowns
  54. 54. Thank you! email: zhangboxun@gmail.com twitter: @BoxunZhang quora: Boxun Zhang

×