Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Visual Analytics for Linguistics - Day 4 ESSLLI - structured data
1. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Visual Analytics for Linguistics - Day 4
Olga Scrivner
2. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
What You Will Learn
DAY 1 Introduction to Visual Analytics
DAY 2 Visualization Methods, Design, and Tools
DAY 3 Working with Unstructured Data
DAY 4 Working with Structured Data
DAY 5 Advanced Analytics
3. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Our Materials - Web Site
http:
//obscrivn.wixsite.com/visualization
4. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
What We Need
Interactive Text Mining Suite
Language Variation Suite
Download R code
Download files from Day 3 and Day 4
R and Rstudio
R libraries: tm, party, partykit
5. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
What We Need
Interactive Text Mining Suite
Language Variation Suite
Download R code
Download files from Day 3 and Day 4
R and Rstudio
R libraries: tm, party, partykit
6. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Text Mining Review
Intro to ITMS - slides Day 3
R coding
7. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Text Mining Review
1. Open RStudio
2. Open R file textmining.R:
3. Set up working directory
8. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Text Mining Package - tm
Main structure - corpus
Corpus is constructed via DirSource, VectorSource,
DataframeSource
mycorpus <- Corpus(VectorSource(file.txt))
9. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Preprocessing with tm
lower case
mycorpus <- tm_map(mycorpus, tolower)
remove punctuation
mycorpus <- tm_map(mycorpus,
removePunctuation)
remove numbers
mycorpus <- tm_map(mycorpus, removeNumbers)
remove stopwords
mycorpus <- tm_map(mycorpus, removeWords,
stopwords(’english’))
mycorpus <- tm_map(mycorpus, stripWhitespace)
mycorpus <- tm_map(mycorpus,
PlainTextDocument)
10. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
What is LVS?
Language Variation Suite
It is a Shiny web application originally designed for data
analysis in sociolinguistic research.
It can be used for:
Processing spreadsheet data
Visualizing data
Analyzing means, regression, conditional trees ...
(and much more)
11. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Background
LVS is built in R using Shiny package:
1. R - a free programming language for statistical
computing and graphics
2. Shiny App - a web application framework for R
Computational power of R + Web interactivity
12. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Background
http://littleactuary.github.io/blog/Web-application-framework-with-Shiny/
13. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Workspace
Browser
Chrome, Firefox, Safari - recommendable
Explorer may cause instability issues
Accessibility
PC, Mac, Linux
Data files will be uploaded from any location on your
computer
Smart Phone
Data files must be on a cloud platform connected to
your phone account (e.g. dropbox)
14. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Server
Since LVS is hosted on a server, Shiny idle time-out settings
may stop application when it is left inactive (it will grey out).
Solution: Click reload and re-upload your csv file
15. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Data Preparation
Important things to consider before data entry:
File format:
Comma separated value (CSV) - faster processing
Excel format will slow processing
Column names should not contain spaces
Permitted: non-accented characters, numbers,
underscore, hyphen, and period
One column must contain your dependent variable
The rest of the columns contain independent variables
16. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Terminology Review
a. Categorical - non-numerical data with two values
yes - no; male - female
b. Continuous - numerical data
duration, age, year
c. Multinomial - non-numerical data with three or more
values
regions, nationalities
d. Ordinal - scale: currently not supported
17. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Terminology Review
a. Categorical - non-numerical data with two values
yes - no; male - female
b. Continuous - numerical data
duration, age, year
c. Multinomial - non-numerical data with three or more
values
regions, nationalities
d. Ordinal - scale: currently not supported
18. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Workshop Files
1. http://obscrivn.wixsite.com/visualization
Download file Day 4: movie_metadata.csv
Simplified set from https://www.kaggle.com/
deepmatrix/imdb-5000-movie-dataset
2. LVS web site: http://languagevariationsuite.com/
http://cl.indiana.edu/~obscrivn/docs/movie_metadata.csv
19. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Movie Data
Budget
Director
Actor 1
Director facebook likes
Actor 1 facebook likes
Genre
Year
20. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Language Variation Suite - Structure
1. Data
Upload file, data summary, adjust data, cross tabulation
2. Visual Analysis
Plotting, cluster classification
3. Inferential Statistics
Modeling, regression, conditional trees, random forest
21. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Language Variation Suite - Structure
1. Data
Upload file, data summary, adjust data, cross tabulation
2. Visual Analysis
Plotting, cluster classification
3. Inferential Statistics
Modeling, regression, conditional trees, random forest
22. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Upload File
Upload movie_metadata.csv
23. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Uploaded Dataset
The data content is imported as a table and allows for
sorting columns.
24. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Summary
Summary provides a quantitative summary for each variable,
e.g. frequency count, mean, median.
25. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Data Structure
1. Total number of observations (rows)
2. Number of variables (columns)
3. Variable types
Factor - categorical values
Num - numeric values (0.95, 1.05)
Int - integer values (1, 2, 3)
26. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Cross Tabulation
Cross-tabulation examines the relationship between variables.
27. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Cross Tabulation Plot
28. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Cross Tabulation Plot
29. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Adjusting Browser - Layout
Shiny pages are fluid and reactive.
30. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Adjusting Browser - Layout
31. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Language Variation Suite - Structure
1. Data
Upload file, data summary, adjust data, cross tabulation
2. Visual Analysis
Plotting, cluster classification
3. Inferential statistics
Modeling, regression, varbrul analysis, conditional trees,
random forest
32. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
One Variable Plot
33. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
One Variable Plot
34. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Customizing Plot
35. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Customizing Plot
36. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Saving Plot
37. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Cluster Plot
Classification of data into sub-groups is based on
pairwise similarities
Groups are clustered in the form of a tree-like
dendrogram
38. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Cluster Plot
39. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Cluster Plot
Group 1 Animation, Biography and Group 2 Action, Drama,
Comedy
40. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Inferential Statistics
41. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Language Variation Suite - Structure
1. Data
Upload file, data summary, adjust data, cross tabulation
2. Visual Analysis
Plotting, cluster classification
3. Inferential statistics
Modeling, regression, varbrul analysis, conditional trees,
random forest
42. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
How to Create a Regression Model
Step 1 Modeling - create a model with dependent
and independent variables
Step 2 Regression - specify the type of regression
(fixed, mixed) and type of dependent variable
(binary, continuous, multinomial)
Step 3 Stepwise Regression - compare models
(Log-likelihood, AIC, BIC)
Step 4 Conditional Trees - apply non-parametric
tests to the model
43. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Modeling
44. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Regression Types
Model
a.) Fixed effect
b.) Mixed effect - individual speaker/token variation (within
group)
Type of Dependent Variable
a.) Binary/categorical (only two values)
b.) Continuous (numeric)
c.) Multinomial - categorical with more than two values
45. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Regression
46. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Model Output
47. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Model Output
48. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Interpretation: Budget and Genre
Genre Action is the reference value
Positive coefficient - positive effect
Negative coefficient - negative effect
http://www.free-online-calculator-use.com/scientific-notation-converter.html
49. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Interpretation: Budget and Genre
Genre Action is the reference value
Positive coefficient - positive effect
Negative coefficient - negative effect
http://www.free-online-calculator-use.com/scientific-notation-converter.html
exponential notation:
1.46e-7
0.000000146
50. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Conditional Tree
Conditional tree: a simple non-parametric regression analysis,
commonly used in social and psychological studies
Linear regression: all information is combined linearly
Conditional tree regression: visual splitting to capture
interaction between variables
Recursive splitting (tree branches)
51. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Conditional Tree
52. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Conditional Tree
53. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Conditional Tree
1. Genre is the significant factor for budget
2. Budget distribution is split in two groups:
Action and Animation
Biography, Comedy and Drama
3. Budget is significantly higher for Animation and Action
54. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Random Forest
1. Variable importance for predictors
2. Robust technique with small n large p data
3. All predictors considered jointly (allows for inclusion of
correlated factors)
55. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Random Forest
Let’s add more factors!
Return to Modeling
Add independent factors: director facebook likes,
actor 1 facebook likes, title year
56. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Random Forest
57. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Random Forest
Genre is the most important predictor for this model.
Close to zero or red-dotted line (cut off values) -
irrelevant for this model
58. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Assignment
Select LVS or ITMS
Upload your own file (csv for LVS or txt/pdf for ITMS)
Explore your data
59. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
References I
Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics.
Cambridge: Cambridge University Press
Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas
Biber Randi Reppen (eds.), The Cambridge Handbook of English Corpus Linguistics.
Cambridge: Cambridge University Press
Schnapp, Jeffrey, and Peter Presner. 2009. Digital Humanities Manifesto 2.0.
http://gifsanimados.espaciolatino.com/x_bob_esponja_8.gif
https://daniellestolt.files.wordpress.com/2013/01/are-you-ready1.gif
http://www.martijnwieling.nl/R/sheets.pdf
60. Visual Analytics
for Linguistics -
Day 4
Olga Scrivner
Course Info
tm Package
Statistical
Visualization
Data Preparation
LVS
Working with
Data
Visual Analytics
Inferential
Analysis
Resources
http://www.rdatamining.com/examples/text-mining
https:
//en.wikibooks.org/wiki/R_Programming/Text_Processing
http://data.library.virginia.edu/
reading-pdf-files-into-r-for-text-mining/
http://www.katrinerk.com/courses/
words-in-a-haystack-an-introductory-statistics-course/
schedule-words-in-a-haystack/
r-code-the-text-mining-package
tm package