Successfully reported this slideshow.
Your SlideShare is downloading. ×

How to Start Doing Data Science

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
 what is data science
what is data science
Loading in …3
×

Check these out next

1 of 33 Ad

How to Start Doing Data Science

Download to read offline

This talk goes over what Data Science is and how
you can start working with data in
your role. This is for everyone interested in Data
Science who might be unsure about how to
start working with data. Learn the core
concepts of Data Science and how you can
start learning data science pain-free!

This talk goes over what Data Science is and how
you can start working with data in
your role. This is for everyone interested in Data
Science who might be unsure about how to
start working with data. Learn the core
concepts of Data Science and how you can
start learning data science pain-free!

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to How to Start Doing Data Science (20)

Advertisement

Recently uploaded (20)

How to Start Doing Data Science

  1. 1. Ayodele Odubela How to Start Doing Data Science
  2. 2. About Me ● Data Scientist @ CometML ● MS in Data Science from Regis University ● Teaching Explainable ML ● Author of Getting Started in Data Science ● Currently writing Uncovering Bias in Machine Learning
  3. 3. Skills 01
  4. 4. What is Data Science? Data science is an inter-disciplinary field that uses (somewhat) scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data.
  5. 5. Coding SQL Python R Understanding and Creating New Metrics Deciding which methods work in your industry Business Sense Math Statistics Probability Linear Algebra
  6. 6. What Data Projects Include Identify a Problem Asses the org’s incentives Gather & clean data Data documentation Exploratory Analysis Inferential Statistics Data Storytelling Harm identification and mitigation Creating ML Models Building User Recourse Frameworks
  7. 7. Relevant Roles Data Scientist Skills: Advanced SQL, Intermediate Python/R, Intermediate Statistics Machine Learning Engineer Skills: Advanced Python, Tensorflow/PyTorch/Keras, Intermediate Linear Algebra, Calc, & Statistics Research Scientist Skills: Advanced Math, Science Communications
  8. 8. Tasks 02
  9. 9. Data Wrangling Any language can be used to get data from databases and API’s
  10. 10. Data Cleaning Dealing with Missing Values Combining Sparse Categorical Columns
  11. 11. Data Transformations Power Transforms and scaling/normalization We do this to make modeling structured data easier
  12. 12. Functional Programming Applying and composing functions to make code more concise and reusable
  13. 13. Experimental Design Understanding and making the consistent experimental choices Hypothesis Testing A/B Testing
  14. 14. Concepts 03
  15. 15. Goals ● Predict future events given past data ● Find anomalies in our datasets ● Make recommendations based on someone’s interests
  16. 16. Methods 1. Clean data so its in a format we can model 2. Understand data distributions to inform model selection 3. Perform Exploratory Data Analysis to grasp data 4. Choose modeling techniques that help us solve problems 5. Measure how well our models perform and optimize then 6. Iterate!
  17. 17. Exploratory What? In statistics, exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.
  18. 18. Regression Is just a fancy word for predicting numbers
  19. 19. Classification attempts to tell different things apart
  20. 20. Clustering tries to identify groups of similar things based on how far apart they are
  21. 21. Reinforcement Learning autonomous agents learn from their environment and make new decisions based on if they were rewarded or punished
  22. 22. Hands-On Experience 04
  23. 23. Practice Consistently
  24. 24. Finding Data Kaggle UCI Data Repository Data.World Government & Local Open Data Web Scraping Public APIs
  25. 25. Cleaning & Manipulating Data Grasp the basic techniques Build intuition for when to use certain methods Understand pros and cons of each Tools: Excel Python & R SQL
  26. 26. Getting Practice Github Medium Tutorials Hackathons MooCs
  27. 27. To get a formal education or not?
  28. 28. Market Yourself 05 Even while you’re still learning
  29. 29. Communicate Your Value How have you impacted past businesses? How would your relevant projects help a company? Do you know how to quantify your value?
  30. 30. Github Showing off code projects Connecting with other developers Collaborating and proving technical skills
  31. 31. Blog / Personal Website Share Expertise Show off Portfolio Provide insight into your thought process
  32. 32. Thank You! 25% off Getting Started in Data Science Code: VBROWNBAG @DataSciBae ayodeleodubela.com

×