Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Exploratory data analysis and models in R


Published on

Effective exploratory data analyses is everything.
a. Tidy data.
b. Inspect data structure.
c. Data viz.
d. Basic exploratory data analyses.

However, now that we are ready to apply models, we add in one more tiny step. Visualize the data to better understand its typology and underlying distribution. Then, you are ready to fit your models.

A statistical model is an elegant, representative simplification of the patterns you have identified through data viz and EDA. It should capture data/experimental structure including the key variables, appropriate levels, and relevant covariation or contexts that mediate outcomes. It should support the data viz. It should provide an estimate of the statistical likelihood or probability of differences. Ideally, the underlying coefficients should also be mined to convey an estimate of effect sizes. A t.test, chi.square test, linear model, general linear model, or generalized linear mixed model are all examples of models that describe and summarize patterns and each have associated assumptions about the data they embody. Hence, the final step pre-model fit, is explore distributions.

Conceptually, there are two kind of models. Those that look back and those that look forward. Think tardis or time machine. A model is always a snapshot using your time machine. It can be a grab of what happened or a future snap of what you predict. In R, there is simple code to time travel in either direction. Actually, there is no time - see arrow of time theory - only an observer potential perception of it. Statistical models are our observers here. These observers use ‘probability distributions’ as we described in 'statistical thinking' to calibrate what we think we observed or will observe given the evidence at hand.

Dalgaard 2008. Introductory statistics with R.
Grolemund & Wickham 2016. R for Data Science.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Exploratory data analysis and models in R

  1. 1. @cjlortie exploratory data analysis & models in R
  2. 2. untangle, describe, predict, share, advance understanding
  3. 3. read data/build frame inspect data structure visualize data EDA fit model
  4. 4. EDA != the statistical model
  5. 5. EDA creative adventures with your data
  6. 6. EDA structure scope of inference data quality [ ]
  7. 7. statistical model an elegant, representative simplification of the patterns identified through data viz & EDA
  8. 8. model convey structure support evidence estimate probability [ ]
  9. 9. two kinds of statistical models descriptive or predictive
  10. 10. time travel backward or forward but not both
  11. 11. effective model use is still process not product effect sizes and coefficients
  12. 12. fit, inspect, mine
  13. 13. Dalgaard 2008
  14. 14. t.test to GLMM
  15. 15. summary and coef
  16. 16. predict make_prediction add_predictions add_residuals
  17. 17. train & test predictions are for new outcomes & naive data
  18. 18. bootstrapping
  19. 19. fixed or random effects
  20. 20. big picture of models is to advance understanding ‘R for Data Science’ convergence with R for statistics