Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Clare Corthell: Learning Data Science Online


Published on

Clare Corthell, Data Scientist and Designer at Mattermark, and author of the Open Source Data Science Masters, shares her experience teaching herself data science with online resources.

Published in: Data & Analytics
  • Be the first to comment

Clare Corthell: Learning Data Science Online

  1. 1. THE OPEN SOURCE DATA SCIENCE MASTERS (THE DIY DATA SCIENTIST) Clare Corthell Data Scientist at Mattermark @clarecorthell
  2. 2. Deal Intelligence Platform interface to live data about private companies
  3. 3. TODAY • What a Data Scientist does • Paths to becoming a Data Scientist • Where to start • Navigating a path • Why you should run toward hard things
  4. 4. WHAT DOES A DATA SCIENTIST DO? Data Scientists turn data into knowledge by answering the right questions Which is also predicated on asking the right questions
  5. 5. HOW DO I BECOME A DATA SCIENTIST? the answer you don’t want… There’s no paved road, no one way
  6. 6. PATHS 1. Get a Classic Masters from an accredited University <Warning> I have yet to see one that’s better than the OSDSM 2. Attend a Bootcamp or Academy • Zipfian Academy (SF) • Insight Data Science Fellows (Palo Alto, NYC) • Data Science Retreat (Berlin) 3. Self-Taught • The Open Source Data Science Masters
  7. 7. THEORY & APPLICATION or, why universities haven’t figured this out yet Universities don’t focus on “Data Science” because it’s tightly bound to application. Universities develop theory. Businesses develop applications. The two exist symbiotically - they do need each other. The goals are simply very different.
  8. 8. • Math • Computing • Algorithms • Distributed Computing • Databases • Data Mining • Machine Learning • Graph Theory • Natural Language Processing • Analysis • Visualization • Python (language & libraries) The Open Source Data Science Masters The internet helps me curate - hence Open Source
  9. 9. (that’s alot)
  10. 10. CLARE’S PATH Previously Product Designer, front end dev Transcript 6 months of study Data Scientist & Machine Learning Developer at Mattermark My team builds domain-specific systems for classification, recommendation, prediction, crawling, fact extraction, and more languages Python SQL machine learning Scikit Learn data manipulation Pandas Numpy matplotlib NLTK design html/css/js
  11. 11. 1. Get a goal 2. Get a plan 3. Get mentorship 4. Get a project
  12. 12. 1. Get a goal What kind of “Data Scientist” do you want to be? Explore the different roles Pick something that sparks your interest Find out what those people do on a daily basis
  13. 13. Rachel Schutt, Doing Data Science
  14. 14. Analyzing the Analyzers, O’Reilly
  15. 15. 2. Get a plan Figure out what skills you need to be minimally effective Design a Curriculum (fork the OSDSM!) Plan a schedule of study
  16. 16. Dave Holtz Airbnb
  17. 17. 3. Get mentorship Talk to people on twitter Ask to buy them coffee (with a specific need or question in hand) Get informational interviews (a lost art; they can turn into real interviews, but are low-pressure)
  18. 18. 4. Get a question (make it a small question - don’t set yourself up for failure) Project Use real-world data to answer a question Who do iguana owners connect to on twitter? Work on a real business problem Help a non-profit* with data they don’t understand What channels of marketing are working for us? *Orgs that coordinate working with NGOs: Bayes Impact, DataKind
  19. 19. Let’s talk about where this perfect plan gets really incredibly difficult (Let’s start with a tautology)
  20. 20. HARD THINGS ARE HARD Hard things are hard because there are no easy answers or recipes. They are hard because your emotions are at odds with your logic. They are hard because you don’t know the answer and you cannot ask for help without showing weakness. Ben Horowitz The Hard Thing about Hard Things
  21. 21. When something scares you run like hell right into it. The hardest things are things people avoid the most. That’s your marginal advantage. Maybe that’s why there aren’t enough Data Scientists. You will figure it out. It’s about ego management and problem solving.
  22. 22. RUN TOWARD HARD THINGS Choosing what you want to do and what to work on Not knowing everything Being overwhelmed Time Management Math Coding
  23. 23. Not knowing everything Being overwhelmed There are a million things you could learn and work on. That’s overwhelming. But you can’t afford to get overwhelmed. You won’t know everything. It’s impractical and impossible to know everything. Learn to say “I don’t know.” FYI Programmers don’t read books. They reference them as needed.
  24. 24. Time Management How do I do all of this in a reasonable amount of time? - You don’t. - Be rigorous. Ask yourself: Will this directly help me achieve my goal? Refine your goals, focus your work. Don’t switch tasks. Focus on one thing at a time.
  25. 25. Why is time management so hard? We’re used to other people telling us what to do; Teachers Managers Parents
  26. 26. CODING IS HARD.
  27. 27. a hint for those new to programming google stackoverflow + problem
  28. 28. why code?
  29. 29. HUMANS SHOULD BE HUMANS AND COMPUTERS SHOULD BE COMPUTERS. You must code. Because automation. And no, there is no shortcut.
  30. 30. YOUR ADVANTAGE Self-study in Data Science is hard. But what you spend in energy and commitment to self-teaching is returned to you in: • Choice of professional focus • Respect from potential employers for managing yourself. You want to work with people who will respect and recognize that. • Skills that are tough to get from a university or employer • A path with no gatekeepers - no one will stop you.
  31. 31. Take the first step.
  32. 32. 1. Learn to code in Python. 2. Take Intro to Data Science (UW) 3. Go get a coffee 4. Ask one question
  33. 33. i ♥ questions @clarecorthell