Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
×

# Exploratory data analysis and models in R

846 views

Published on

Effective exploratory data analyses is everything.
a. Tidy data.
b. Inspect data structure.
c. Data viz.
d. Basic exploratory data analyses.

However, now that we are ready to apply models, we add in one more tiny step. Visualize the data to better understand its typology and underlying distribution. Then, you are ready to fit your models.

A statistical model is an elegant, representative simplification of the patterns you have identified through data viz and EDA. It should capture data/experimental structure including the key variables, appropriate levels, and relevant covariation or contexts that mediate outcomes. It should support the data viz. It should provide an estimate of the statistical likelihood or probability of differences. Ideally, the underlying coefficients should also be mined to convey an estimate of effect sizes. A t.test, chi.square test, linear model, general linear model, or generalized linear mixed model are all examples of models that describe and summarize patterns and each have associated assumptions about the data they embody. Hence, the final step pre-model fit, is explore distributions.

Conceptually, there are two kind of models. Those that look back and those that look forward. Think tardis or time machine. A model is always a snapshot using your time machine. It can be a grab of what happened or a future snap of what you predict. In R, there is simple code to time travel in either direction. Actually, there is no time - see arrow of time theory - only an observer potential perception of it. Statistical models are our observers here. These observers use ‘probability distributions’ as we described in 'statistical thinking' to calibrate what we think we observed or will observe given the evidence at hand.

Sources
Dalgaard 2008. Introductory statistics with R.
Grolemund & Wickham 2016. R for Data Science.

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

### Exploratory data analysis and models in R

1. 1. @cjlortie exploratory data analysis & models in R
2. 2. untangle, describe, predict, share, advance understanding
3. 3. read data/build frame inspect data structure visualize data EDA ﬁt model
4. 4. EDA != the statistical model
5. 5. EDA creative adventures with your data
6. 6. EDA structure scope of inference data quality [ ]
7. 7. statistical model an elegant, representative simpliﬁcation of the patterns identiﬁed through data viz & EDA
8. 8. model convey structure support evidence estimate probability [ ]
9. 9. two kinds of statistical models descriptive or predictive
10. 10. time travel backward or forward but not both
11. 11. effective model use is still process not product effect sizes and coefﬁcients
12. 12. ﬁt, inspect, mine
13. 13. Dalgaard 2008
14. 14. t.test to GLMM
15. 15. summary and coef