Your SlideShare is downloading.
×

- 1. Ayodele Odubela How to Start Doing Data Science
- 2. About Me ● Data Scientist @ CometML ● MS in Data Science from Regis University ● Teaching Explainable ML ● Author of Getting Started in Data Science ● Currently writing Uncovering Bias in Machine Learning
- 3. Skills 01
- 4. What is Data Science? Data science is an inter-disciplinary field that uses (somewhat) scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data.
- 5. Coding SQL Python R Understanding and Creating New Metrics Deciding which methods work in your industry Business Sense Math Statistics Probability Linear Algebra
- 6. What Data Projects Include Identify a Problem Asses the org’s incentives Gather & clean data Data documentation Exploratory Analysis Inferential Statistics Data Storytelling Harm identification and mitigation Creating ML Models Building User Recourse Frameworks
- 7. Relevant Roles Data Scientist Skills: Advanced SQL, Intermediate Python/R, Intermediate Statistics Machine Learning Engineer Skills: Advanced Python, Tensorflow/PyTorch/Keras, Intermediate Linear Algebra, Calc, & Statistics Research Scientist Skills: Advanced Math, Science Communications
- 8. Tasks 02
- 9. Data Wrangling Any language can be used to get data from databases and API’s
- 10. Data Cleaning Dealing with Missing Values Combining Sparse Categorical Columns
- 11. Data Transformations Power Transforms and scaling/normalization We do this to make modeling structured data easier
- 12. Functional Programming Applying and composing functions to make code more concise and reusable
- 13. Experimental Design Understanding and making the consistent experimental choices Hypothesis Testing A/B Testing
- 14. Concepts 03
- 15. Goals ● Predict future events given past data ● Find anomalies in our datasets ● Make recommendations based on someone’s interests
- 16. Methods 1. Clean data so its in a format we can model 2. Understand data distributions to inform model selection 3. Perform Exploratory Data Analysis to grasp data 4. Choose modeling techniques that help us solve problems 5. Measure how well our models perform and optimize then 6. Iterate!
- 17. Exploratory What? In statistics, exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.
- 18. Regression Is just a fancy word for predicting numbers
- 19. Classification attempts to tell different things apart
- 20. Clustering tries to identify groups of similar things based on how far apart they are
- 21. Reinforcement Learning autonomous agents learn from their environment and make new decisions based on if they were rewarded or punished
- 22. Hands-On Experience 04
- 23. Practice Consistently
- 24. Finding Data Kaggle UCI Data Repository Data.World Government & Local Open Data Web Scraping Public APIs
- 25. Cleaning & Manipulating Data Grasp the basic techniques Build intuition for when to use certain methods Understand pros and cons of each Tools: Excel Python & R SQL
- 26. Getting Practice Github Medium Tutorials Hackathons MooCs
- 27. To get a formal education or not?
- 28. Market Yourself 05 Even while you’re still learning
- 29. Communicate Your Value How have you impacted past businesses? How would your relevant projects help a company? Do you know how to quantify your value?
- 30. Github Showing off code projects Connecting with other developers Collaborating and proving technical skills
- 31. Blog / Personal Website Share Expertise Show off Portfolio Provide insight into your thought process
- 32. Thank You! 25% off Getting Started in Data Science Code: VBROWNBAG @DataSciBae ayodeleodubela.com