Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Talk Like a Data Scientist

748 views

Published on

This is a guide to understanding machine learning terminology. It covers causal inference, time series data, linear models and other terms. Download the full eBook here: http://content.nexosis.com/talk-like-a-data-scientist-ebook

Published in: Data & Analytics

Talk Like a Data Scientist

  1. 1. 1 Talk like a data scientist A Guide to Understanding Machine Learning Terminology
  2. 2. 2 Why machine learning matters o In every data pattern, there are anomalies that break or alter the pattern – there is a reason for these anomalies, but humans are generally terrible at considering or even processing all the external forces on a data pattern o Fortunately for you (and us) this is where machine learning shines – it can analyze vast quantities of inputs (like weather) and determine their effect on the pattern, then assign a dollar amount to each outside factor
  3. 3. 3 Causal Inference
  4. 4. 4 o Determined by studying the change, or lack thereof, in some observable or measurable quantity when conditions change o Another term for causal inference is impact analysis What is it? Causal inference
  5. 5. 5 o The infamous button color test is a good and simple example o A change in website button clicks (measurable quantity) may be explained by a change in button color (condition) Example Causal inference Button A Button B
  6. 6. 6 o If you know something happened but not it’s effect, you might be wasting a lot of time and/or money on things that don’t work Why does it matter? Causal inference
  7. 7. 7 Time Series Data
  8. 8. 8 o Cross-sectional data o Collected at the same point in time o Example: Population size per country o Time series data o Scattered over a period of time o Example: Hourly web traffic There are two main types of data: Time series data (What we specialize in)
  9. 9. 9 Time series data Data scientists look for trends over time, seasonality and anomalous behavior when analyzing time series data.
  10. 10. 10 o It can be tricky to model o It’s important to have a complete, continuous data set o Even a week of missing data could adversely affect a model It’s complicated Time series data
  11. 11. 11 o Predictive analytics o It’s possible to predict with a high degree of certainty what the data will look like in the coming weeks o Impact analysis o This sort of what-if analysis is when we project what data would have looked like had a certain event never taken place What can it tell us? Time series data
  12. 12. 12 Linear Models
  13. 13. 13 o X’s are predictor variables – a.k.a. independent variables (this value changes) o A’s are coefficients – parameters that have a single value as a consequence of creating the model (doesn’t change) o The e’s are error terms – the difference between the actual measurement and model prediction o Y is the target – this value is dependent on the value of the X’s What are they? Linear models
  14. 14. 14 They may appear nonlinear Linear models
  15. 15. 15 o The predictors in the model on the previous slide can be mapped to a new set of predictor variables: o The equation can easily be rewritten in a format that looks like a general linear model equation based on the new predictor: Look again Linear models
  16. 16. 16 Linear models Not everything has a 1-to-1 relationship. Nonlinear functions contained within a linear model capture diverse relationships between predictor and target variables.
  17. 17. 17 o The following example is not linear because it can’t be rewritten as a linear model: o No matter how the equation is manipulated, there is no way to isolate the coefficients for it to be rewritten in linear form What is NOT a linear model? Linear models
  18. 18. 18 o They can be used to accurately model consumer behavior and answer a number of business intelligence questions o They have a simple interpretation since it’s easy to inspect the contribution of each predictor and coefficient pair Why should we use them? Linear models
  19. 19. 19 Other topics covered in our eBook o Regressions o Training Set vs. Test Set o Residuals o MAPE (Mean Absolute Percent Error) o The Bias & Variance Trade-off o Seasonality o Trend o Autoregressive Models
  20. 20. 20 Download the eBook Feel enlightened by data science yet? We hope so! Download our “Talk Like a Data Scientist” eBook to learn more terms.

×