Successfully reported this slideshow.
Upcoming SlideShare
×

# Workshop on Quantitative Analytics Using Interactive On-line Tool

226 views

Published on

Introduction to a user-friendly on-line interactive statistical tool built using Shiny and R (Part I)

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Workshop Materials: http://ssrc.indiana.edu/seminars/wimdocs/2017-02-03_wim_scrivner_shiny-tools_files.zip

Are you sure you want to  Yes  No
• Be the first to like this

### Workshop on Quantitative Analytics Using Interactive On-line Tool

1. 1. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Interactive Visual Data Analysis Part One Language Variation Suite Olga Scrivner Indiana University Workshop in Methods 1 / 44
2. 2. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Outline 1 Introduce a web application for quantitative analysis: LVS 2 Develop practical skills 3 Understand and interpret advanced statistical models PositionSentence p < 0.001 1 ind, pre post Heaviness p = 0.003 2 ≤ 1 > 1 Period p < 0.001 3 ≤ 1 > 1 Node 4 (n = 81) VOOV 0 0.2 0.4 0.6 0.8 1 Node 5 (n = 119) VOOV 0 0.2 0.4 0.6 0.8 1 Node 6 (n = 181) VOOV 0 0.2 0.4 0.6 0.8 1 Period p < 0.001 7 ≤ 2 > 2 Node 8 (n = 221) VOOV 0 0.2 0.4 0.6 0.8 1 Focus p < 0.001 9 cf nf Node 10 (n = 66) VOOV 0 0.2 0.4 0.6 0.8 1 Main_Verb_Structure p < 0.001 11 ACIOther, Restructuring Node 12 (n = 43) VOOV 0 0.2 0.4 0.6 0.8 1 Node 13 (n = 265) VOOV 0 0.2 0.4 0.6 0.8 1 2 / 44
3. 3. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References What is LVS? Language Variation Suite It is a Shiny web application originally designed for data analysis in sociolinguistic research. It can be used for: Processing spreadsheet data Reporting in tables and graphs Analyzing means, regression, conditional trees ... (and much more) 3 / 44
4. 4. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Background LVS is built in R using Shiny package: 1 R - a free programming language for statistical computing and graphics 2 Shiny App - a web application framework for R Computational power of R + Web interactivity 4 / 44
5. 5. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Background http://littleactuary.github.io/blog/Web-application-framework-with-Shiny/ 5 / 44
6. 6. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Workspace Browser Chrome, Firefox, Safari - recommendable Explorer may cause instability issues Accessibility PC, Mac, Linux Data ﬁles will be uploaded from any location on your computer Smart Phone Data ﬁles must be on a cloud platform connected to your phone account (e.g. dropbox) 6 / 44
7. 7. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Server Since LVS is hosted on a server, Shiny idle time-out settings may stop application when it is left inactive (it will grey out). Solution: Click reload and re-upload your csv ﬁle 7 / 44
8. 8. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Data Preparation Important things to consider before data entry: File format: Comma separated value (CSV) - faster processing Excel format will slow processing Column names should not contain spaces Permitted: non-accented characters, numbers, underscore, hyphen, and period One column must contain your dependent variable The rest of the columns contain independent variables 8 / 44
9. 9. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Terminology Review a. Categorical - non-numerical data with two values yes - no; male - female b. Continuous - numerical data duration, age, year c. Multinomial - non-numerical data with three or more values regions, nationalities d. Ordinal - scale: currently not supported 9 / 44
10. 10. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Terminology Review a. Categorical - non-numerical data with two values yes - no; male - female b. Continuous - numerical data duration, age, year c. Multinomial - non-numerical data with three or more values regions, nationalities d. Ordinal - scale: currently not supported 9 / 44
11. 11. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Workshop Files http://ssrc.indiana.edu/seminars/wim.shtml 1 movie metadata.csv Simpliﬁed set from https://www.kaggle.com/deepmatrix/ imdb-5000-movie-dataset 2 LVS web site: https://languagevariationsuite.com 10 / 44
12. 12. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Movie Data Budget Director Actor 1 Director facebook likes Actor 1 facebook likes Genre Year 11 / 44
13. 13. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Language Variation Suite - Structure 1 Data Upload ﬁle, data summary, adjust data, cross tabulation 2 Visual Analysis Plotting, cluster classiﬁcation 3 Inferential Statistics Modeling, regression, conditional trees, random forest 12 / 44
14. 14. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Language Variation Suite - Structure 1 Data Upload ﬁle, data summary, adjust data, cross tabulation 2 Visual Analysis Plotting, cluster classiﬁcation 3 Inferential Statistics Modeling, regression, conditional trees, random forest 12 / 44
15. 15. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Upload File Upload movie metadata.csv 13 / 44
16. 16. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Uploaded Dataset The data content is imported as a table and allows for sorting columns. 14 / 44
17. 17. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Summary Summary provides a quantitative summary for each variable, e.g. frequency count, mean, median. 15 / 44
18. 18. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Data Structure 1 Total number of observations (rows) 2 Number of variables (columns) 3 Variable types Factor - categorical values Num - numeric values (0.95, 1.05) Int - integer values (1, 2, 3) 16 / 44
19. 19. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Cross Tabulation Cross-tabulation examines the relationship between variables. 17 / 44
20. 20. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Cross Tabulation Plot 18 / 44
21. 21. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Cross Tabulation Plot 18 / 44
22. 22. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Adjusting Browser - Layout Shiny pages are ﬂuid and reactive. 19 / 44
23. 23. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Adjusting Browser - Layout 20 / 44
24. 24. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Language Variation Suite - Structure 1 Data Upload ﬁle, data summary, adjust data, cross tabulation 2 Visual Analysis Plotting, cluster classiﬁcation 3 Inferential statistics Modeling, regression, varbrul analysis, conditional trees, random forest 21 / 44
25. 25. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Visual Analytics Visual Analytics: “The science of analytical reasoning facilitated by visual interactive interfaces” (Thomas et al. 2005) 22 / 44
26. 26. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References One Variable Plot 23 / 44
27. 27. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References One Variable Plot 23 / 44
28. 28. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Customizing Plot 24 / 44
29. 29. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Customizing Plot 24 / 44
30. 30. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Saving Plot 25 / 44
31. 31. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Cluster Plot Classiﬁcation of data into sub-groups is based on pairwise similarities Groups are clustered in the form of a tree-like dendrogram 26 / 44
32. 32. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Cluster Plot 27 / 44
33. 33. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Cluster Plot Group 1 Animation, Biography and Group 2 Action, Drama, Comedy 28 / 44
34. 34. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Inferential Statistics 29 / 44
35. 35. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Language Variation Suite - Structure 1 Data Upload ﬁle, data summary, adjust data, cross tabulation 2 Visual Analysis Plotting, cluster classiﬁcation 3 Inferential statistics Modeling, regression, varbrul analysis, conditional trees, random forest 30 / 44
36. 36. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References How to Create a Regression Model Step 1 Modeling - create a model with dependent and independent variables Step 2 Regression - specify the type of regression (ﬁxed, mixed) and type of dependent variable (binary, continuous, multinomial) Step 3 Stepwise Regression - compare models (Log-likelihood, AIC, BIC) Step 4 Conditional Trees - apply non-parametric tests to the model 31 / 44
37. 37. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Modeling 32 / 44
38. 38. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Regression Types Model a.) Fixed eﬀect b.) Mixed eﬀect - individual speaker/token variation (within group) Type of Dependent Variable a.) Binary/categorical (only two values) b.) Continuous (numeric) c.) Multinomial - categorical with more than two values 33 / 44
39. 39. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Regression 34 / 44
40. 40. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Model Output 35 / 44
41. 41. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Model Output 35 / 44
42. 42. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Interpretation: Budget and Genre Genre Action is the reference value Positive coeﬃcient - positive eﬀect Negative coeﬃcient - negative eﬀect http://www.free-online-calculator-use.com/scientific-notation-converter.html 36 / 44
43. 43. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Interpretation: Budget and Genre Genre Action is the reference value Positive coeﬃcient - positive eﬀect Negative coeﬃcient - negative eﬀect http://www.free-online-calculator-use.com/scientific-notation-converter.html 36 / 44 exponential notation: 1.46e-7 0.000000146
44. 44. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Conditional Tree Conditional tree: a simple non-parametric regression analysis, commonly used in social and psychological studies Linear regression: all information is combined linearly Conditional tree regression: visual splitting to capture interaction between variables Recursive splitting (tree branches) 37 / 44
45. 45. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Conditional Tree 38 / 44
46. 46. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Conditional Tree 38 / 44
47. 47. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Conditional Tree 1 Genre is the signiﬁcant factor for budget 2 Budget distribution is split in two groups: Action and Animation Biography, Comedy and Drama 3 Budget is signiﬁcantly higher for Animation and Action 39 / 44
48. 48. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Random Forest 1 Variable importance for predictors 2 Robust technique with small n large p data 3 All predictors considered jointly (allows for inclusion of correlated factors) 40 / 44
49. 49. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Random Forest Let’s add more factors! Return to Modeling Add independent factors: director facebook likes, actor 1 facebook likes, title year 41 / 44
50. 50. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Random Forest 42 / 44
51. 51. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Random Forest Genre is the most important predictor for this model. Close to zero or red-dotted line (cut oﬀ values) - irrelevant for this model42 / 44
52. 52. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Let’s Have a Short Break 43 / 44
53. 53. Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References References I [1] Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge: Cambridge University Press [2] Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber Randi Reppen (eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press [3] Schnapp, Jeﬀrey, and Peter Presner. 2009. Digital Humanities Manifesto 2.0. [4] http://gifsanimados.espaciolatino.com/x bob esponja 8.gif [5] https://daniellestolt.ﬁles.wordpress.com/2013/01/are-you-ready1.gif [6] http://www.martijnwieling.nl/R/sheets.pdf 44 / 44