SlideShare a Scribd company logo
Language Variation Suite Toolkit
Olga Scrivner, Rafael Orozco, Manuel Díaz-Campos
NWAV47, October 18, 2018
Indiana University and Louisiana State University
Workshop Information
What You Will Learn
Part 1 Cross-disciplinary Methods in Sociolinguistics
Part 2 Web Interface - a novel approach for interactive
socio-analysis
Part 3 Working with Structured Data: LVS
Part 4 Working with Unstructured Data: ITMS
2
Our Materials - Web Site
https://languagevariationsuite.
wordpress.com/
3
What We Need
- Computer
- WiFi
- Smartphone (optional)
- R and Rstudio (not really)
4
What We Need
- Computer
- WiFi
- Smartphone (optional)
- R and Rstudio (not really)
4
Cross-disciplinary Methods
Our Objectives
Goal: Optimize sociolinguistic analysis by using a variety of
cross-disciplinary methods
1. Introduce a novel (socio-)linguistic toolkit for research and
as a teaching tool
2. Develop practical skills via interactive and visual interface
3. Understand and interpret advanced statistical and data
mining models
6
Cross-disciplinary Research
7
Corpus
Linguistics
Computational
Linguistics
Sociolinguistics
Data
Science
Statistics
Cross-disciplinary Research: Methods
8
Corpus
Linguistics
Computational
Linguistics
Sociolinguistics
Data
Science
Statistics
Cleaning
Pre-processing
Creating
Corpora
Parametric
Non-parametric
Models
Clustering
Topic Modeling
Cross-disciplinary Methods: A Closer Look
Random Forest - a more stable than stepwise variable selection
(Tagliamonte and Baayen,2012, 25)
Conditional Tree - a hierarchical tree of significant factors and
their relationship with other predictors
9
Cross-disciplinary Methods: A Closer Look
Clustering - a grouping of variables or individuals or
documents (Gries and Hilpert, 2008, 69)
Topic Modeling - a discovery of language patterns and word usage
over time and across genres (Blei, 2012,
78)
10
How To Unify All Methods: Data Science Workflow
1. Import data into R: read_csv(), read_line() [even praat
files!]
2. Tidy data - pre-processing
3. Transform with dplyr [select, subset, recode...]
4. Visualize with ggplot and plotly
5. Model with glm, lda, randomForest...
11
Tidyverse - a system of R packages for data manipulation,
exploration, and visualization that share a common design
philosophy (Wickham and Grolemund, 2017)
Sociolinguistic Data - Data Analytics
“Analytics is the critical technology
needed to bring value out of data”
(Anonymous)
12
Sociolinguistic Data - Visual Analytics
“The science of analytical reasoning
facilitated by visual interactive interfaces”
(Thomas and Cook, 2005)
“Visual analytics integrates new computational and
theory-based tools with innovative interactive techniques
and visual representations to enable human-information
discourse” (Thomas and Cook, 2005)
PositionSentence
p < 0.001
1
ind, pre post
Heaviness
p = 0.003
2
≤ 1 > 1
Period
p < 0.001
3
≤ 1 > 1
Node 4 (n = 81)
VOOV
0
0.2
0.4
0.6
0.8
1
Node 5 (n = 119)
VOOV
0
0.2
0.4
0.6
0.8
1
Node 6 (n = 181)
VOOV
0
0.2
0.4
0.6
0.8
1
Period
p < 0.001
7
≤ 2 > 2
Node 8 (n = 221)
VOOV
0
0.2
0.4
0.6
0.8
1
Focus
p < 0.001
9
cf nf
Node 10 (n = 66)
VOOV
0
0.2
0.4
0.6
0.8
1
Main_Verb_Structure
p < 0.001
11
ACIOther, Restructuring
Node 12 (n = 43)
VOOV
0
0.2
0.4
0.6
0.8
1
Node 13 (n = 265)
VOOV
0
0.2
0.4
0.6
0.8
1
13
Introduction to LVS
What is LVS?
Language Variation Suite
It is a Shiny web application designed for data analysis in
sociolinguistic research.
It can be used for:
Processing spreadsheet data
Reporting in tables and graphs
Analyzing means, regression, conditional trees ...
(and much more)
15
Background
LVS is built in R using Shiny package:
1. R - a free programming language for statistical computing
and graphics
2. Shiny App - a web application framework for R
Computational power of R + Web interactivity
16
Background
http://littleactuary.github.io/blog/Web-application-framework-with-Shiny/
17
Workspace
Browser
Chrome, Firefox, Safari - recommendable
Explorer may cause instability issues
Accessibility
PC, Mac, Linux
◦ Data files will be uploaded from any location on your
computer
Smart Phone
◦ Data files must be on a cloud platform connected to your
phone account (e.g. dropbox)
18
Server
Since LVS is hosted on a server, Shiny idle time-out settings may
stop application when it is left inactive (it will grey out).
Solution: Click reload and re-upload your csv file
19
Data Preparation
Data Preparation
Important things to consider before data entry:
File format:
◦ Comma separated value (CSV) - faster processing
◦ Excel format will slow processing
Column names should not contain spaces
◦ Permitted: non-accented characters, numbers, underscore,
hyphen, and period
One column must contain your dependent variable
The rest of the columns contain independent variables
21
Terminology Review
a. Categorical/Nominal - non-numerical data with two
values
◦ yes - no; deletion - retention; perfective - imperfective
b. Continuous - numerical data
◦ duration, age, chronological period
c. Multinomial - non-numerical data with three or more
values
◦ deletion - aspiration - retention
d. Ordinal - scale
22
Test Types
Dependent Independent Test
Continuous Continous t-test
Categorical Categorical Chi-square
Continuous Multiple Categorical/ Linear Regression
Continuous
Categorical Multiple Categorical/ Logistic Regression
Continuous Binary or Multinomial
(Dickinson, 2014)
23
24
Questions?
Comments?
24
Questions?
Comments?
Workshop Files
https://languagevariationsuite.wordpress.com/
1. categoricaldata.csv: categorical dependent - Labov New
York 1966 study
2. continuousdata.csv: continuous dependent - Intervocalic
/d/ in Caracas corpus (Díaz-Campos and Gradoville,
2011; Scrivner and Díaz-Campos, 2016)
3. LVS web site:
www.languagevariationsuite.com
25
LVS Structure
Language Variation Suite - Structure
1. Data
◦ Upload file, data summary, adjust data, cross tabulation
2. Visual Analysis
◦ Plotting, cluster classification
3. Inferential Statistics
◦ Modeling, regression, conditional trees, random forest
27
Language Variation Suite - Structure
1. Data
◦ Upload file, data summary, adjust data, cross tabulation
2. Visual Analysis
◦ Plotting, cluster classification
3. Inferential Statistics
◦ Modeling, regression, conditional trees, random forest
27
Upload File
Upload movie_metadata.csv
28
Excel Format
1. Slow processing
2. Requires the name of your excel sheet
29
TIPS: Save Your Excel as CSV Format
To optimize speed - Save as CSV prior upload
30
Upload File
Upload categoricaldata.csv
31
Uploaded Dataset
The data content is imported as a table and allows for sorting
columns.
32
Summary
Summary provides a quantitative summary for each variable,
e.g. frequency count, mean, median.
33
Data Structure
1. Total number of observations (rows)
2. Number of variables (columns)
3. Variable types
◦ Factor - categorical values
◦ Num - numeric values (0.95, 1.05)
◦ Int - integer values (1, 2, 3)
34
Cross Tabulation
Cross-tabulation examines the relationship between variables.
35
Variables
36
Cross-Tabulation Output
Raw frequency / Proportion by column / Proportion across
row
37
38
Questions?
Comments?
Visual Analysis
Language Variation Suite - Structure
1. Data
◦ Upload file, data summary, adjust data, cross tabulation
2. Visual Analysis
◦ Plotting, cluster classification
3. Inferential statistics
◦ Modeling, regression, varbrul analysis, conditional trees,
random forest
40
Adjusting Browser - Layout
Shiny pages are fluid and reactive.
41
One Variable Plot
42
Two Variables Plot
43
Saving Plot
44
Three Variables Plot
45
Cluster Plot
Classification of data into sub-groups is based on pairwise
similarities
Groups are clustered in the form of a tree-like dendrogram
46
Cluster Plot
47
Cluster Plot
Saks (upper middle-class store), Macy’s (middle-class store), Kleins
(working-class)
48
49
Questions?
Comments?
Inferential Analysis
Inferential Statistics
51
Language Variation Suite - Structure
1. Data
◦ Upload file, data summary, adjust data, cross tabulation
2. Visual Analysis
◦ Plotting, cluster classification
3. Inferential statistics
◦ Modeling, regression, varbrul analysis, conditional trees,
random forest
52
How to Create a Regression Model
Step 1 Modeling - create a model with dependent and
independent variables
Step 2 Regression - specify the type of regression (fixed,
mixed) and type of dependent variable (binary,
continuous, multinomial)
Step 3 Stepwise Regression - compare models
(Log-likelihood, AIC, BIC)
Step 4 Conditional Trees - apply non-parametric tests to
the model
53
Modeling
54
Modeling
54
We are interested in RETENTION
= Application
Regression Types
Model
a.) Fixed effect
b.) Mixed effect - individual speaker/token variation (within
group)
Type of Dependent Variable
a.) Binary/categorical (only two values)
b.) Continuous (numeric)
c.) Multinomial - categorical with more than two values
55
Regression
56
Model Output
57
Model Output
57
Interpretation
Deletion is the reference value
Positive coefficient - positive effect
Negative coefficient - negative effect
58
Interpretation - RETENTION
Lexical item Fourth has a negative effect on retention compared
to Floor and is significant
Normal style has a slightly negative effect on retention but its
coefficient is not significant
Macy’s and Saks have a positive and significant effect on
retention. Saks (upper middle class store) is more significant
than Macy’s (middle class store)
http://www.free-online-calculator-use.com/scientific-notation-converter.html
59
Interpretation - RETENTION
Lexical item Fourth has a negative effect on retention compared
to Floor and is significant
Normal style has a slightly negative effect on retention but its
coefficient is not significant
Macy’s and Saks have a positive and significant effect on
retention. Saks (upper middle class store) is more significant
than Macy’s (middle class store)
http://www.free-online-calculator-use.com/scientific-notation-converter.html
59
exponential notation:
1.48e-8
0.0000000148
60
Questions?
Comments?
Conditional Tree
Conditional tree: a simple non-parametric regression analysis,
commonly used in social and psychological studies
Linear regression: all information is combined linearly
Conditional tree regression: visual splitting to capture
interaction between variables
Recursive splitting (tree branches)
61
Conditional Tree - Tagliamonte and Baayen (2012)
1. The distribution of was/were is split in two groups by
individuals.
2. The variant were occurs significantly more frequently with the
first group.
62
Conditional Tree - Tagliamonte and Baayen (2012)
1. Polarity is relevant to the second group of individuals.
2. The variant were occurs significantly more often with negative
polarity
63
Conditional Tree - Tagliamonte and Baayen (2012)
1. Affirmative Polarity is conditioned by Age.
2. The variant was is produced significantly more often by
Individuals of 46 and younger.
64
Conditional Tree
65
Conditional Tree
1. Store is the most significant factor for R-use
◦ Kleins (working class store) - more R-deletion
2. R-use in Macy’s and Saks is conditioned by lexical item:
◦ Floor shows more R-retention than Fourth
3. Style is not significant
66
Model Output
67
Random Forest
1. Variable importance for predictors
2. Robust technique with small n large p data
3. All predictors considered jointly (allows for inclusion of
correlated factors)
68
Random Forest
69
Random Forest
1. Store is the most important predictor
2. Lexical Item is the second predictor
3. Style is irrelevant: close to zero and red dotted line (cut-off
value).
70
Mixed Effects
Fixed and Mixed Models
Fixed Effects Model : All predictors are treated independent.
Underlying assumption - no group-internal
variation between speakers or tokens
Mixed Effects Model : Allows for evaluation of individual- and
group-level variation
72
Fixed and Mixed Models
Fixed Regression Model - ignoring individual variations
(speakers or words) may lead to Type I Error:
“a chance effect is mistaken for a real difference
between the populations”
Mixed Regression Model - prone to Type II Error:
“if speaker variation is at a high level, we cannot
discern small population effects without a large
number of speakers” (Johnson 2009, 2015)
73
Mixed Effect Regression
Mixed Model = fixed effects + random effects
Fixed-effect factor - “repeatable and a small number of levels”
Random-effect factor - “a non-repeatable random sample from
a larger population” (Wieling 2012)
walk, sleep, study, finish, eat, etc
event verb, stative verb
speaker1, speaker3, speaker3, etc
male, female
74
Mixed Effect Regression
Mixed Model = fixed effects + random effects
Fixed-effect factor - “repeatable and a small number of levels”
Random-effect factor - “a non-repeatable random sample from
a larger population” (Wieling 2012)
walk, sleep, study, finish, eat, etc
event verb, stative verb
speaker1, speaker3, speaker3, etc
male, female
74
Preparing for Mixed Model
1. Download continuousdata.csv
2. Upload this file on LVS
75
Data - Uploaded Dataset
76
Mixed Effect Modeling
77
NULL when the dependent variable is continuous
Fixed Effects - independent variables
Mixed Effect Modeling
78
Mixed Effects - group-internal variation
Regression Results
79
Regression Results
79
Interpretation - Random Effects
1. Standard Deviation: a measure of the variability for each
random effect (speakers and tokens)
2. Residual: random variation that is not due to speakers or
tokens (residual error)
80
Interpretation - Fixed Effects
1. Estimate/coefficient: reported in log-odds (negative or
positive)
2. P-value: tells you if the level is significant
81
Bonus - Word Clouds
82
Frequency Plot
83
Frequency Plot
84
Applied Projects with LVS
Publications
- Scrivner, Olga., and Díaz-Campos, M. 2016. Language
Variation Suite: A theoretical and methodological
contribution for linguistic data analysis. Proceedings of the
Linguistic Society of America, 1, 29–1
- Estrada, Monica. 2016. El tú no es de nosotros, es de otros
países: Usos del voseo y actitudes hacia él en el castellano
hondureño. Master Thesis. Louisiana State University
- Orozco, Rafael. 2018. El castellano del Caribe colombiano
en la ciudad de Nueva York: El uso variable d sujetos
pronominales. In Studies in Hispanic and Lusophone
Linguistics 11(1): 89–129
86
Students Perspective: Feedback
“This program makes analysis so much more accessible to
someone like myself who is new to statistical analysis”
“I realize it is actually a tool of analysis every linguist will
appreciate a lot for quantitative analysis”
87
Teachers Perspective
88
References I
Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge: Cambridge University
Press
Bentivoglio, Paola and Mercedes Sedano. 1993. Investigación sociolingüística: sus métodos aplicados a una
experiencia venezolana. Boletín de Lingüística 8. 3-35
Díaz-Campos, Manuel and Stephanie Dickinson. In press. Using Statistics as a Tool in the Analysis of Sociolinguistic
Variation: A comparison of current and traditional methods. In Fernando Tejedo-Herrero (Ed.), Lusophone,
Galician, and Hispanic Linguistics: Bridging Frames and Traditions. Amsterdam: John Benjamins
Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber Randi Reppen (eds.), The
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press
Labov, W. 1966. The Social Stratification of English in New York City. Washington: Center for Applied Linguistics
Scrivner, Olga., and Díaz-Campos, M. 2016. Language Variation Suite: A theoretical and methodological
contribution for linguistic data analysis. Proceedings of the Linguistic Society of America, 1, 29–1
Strobl, Carolin, James Malley, and Gerhard Tutz. An introduction to recursive partitioning: rationale, application,
and characteristics of classification and regression trees, bagging, and random forests. 2009. Psychological
methods 14: 323–348.
Tagliamonte, Sali, and R. Harald Baayen. 2012. Models, forests and trees of York English: Was/were variation as a
case study for statistical practice. Language Variation and Change 24: 135–178.
Tagliamonte, Sali. 2016. Quantitative analysis in language variation and change. In Fernando Tejedo-Herrero and
Sandro Sessarego (Eds.), Spanish Language and Sociolinguistic Analysis. Amsterdam: John Benjamins
89
Thank you!
90
Appendix
Appendix 1: Density
92
Number of bins can have a
disproportionate effect on visualization
Histogram
Density: a non-parametric model of the distribution of points based
on a smooth density estimate
http://scikit-learn.org/stable/modules/density.html
93
Appenix 2 - Data Modification
94
Adjust Data
Retain: Select data subset
Exclude: Exclude variables from a factor group
Recode: Combine and rename variables
Change class: Numeric → factor; factor → numeric
Transform: Apply log transformation to a specific column
ADJUSTED DATASET:
◦ Run - to apply all above changes
◦ Reset - to reset to the original dataset
95
Exclude: Emphatic Style
96
Adjusted Dataset
97
Adjusting Dataset
To revert to the original data, select RESET:
98
Appendix 3 - Model Comparison
99

More Related Content

What's hot

Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...
ranjit banshpal
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
Wake Tech BAS
 
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Jerrin George
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: Clustering
Deepak George
 
An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396Nida Rashid
 
Data Mining
Data MiningData Mining
Data Mining
Jay Nagar
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
IJERA Editor
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
Salah Amean
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
slides
slidesslides
slidesbutest
 
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
ananth
 
On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...
IJAAS Team
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
uetian12
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Syed Atif Naseem
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
zekeLabs Technologies
 

What's hot (18)

Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: Clustering
 
An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396
 
Data Mining
Data MiningData Mining
Data Mining
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
 
Mini_Project
Mini_ProjectMini_Project
Mini_Project
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
slides
slidesslides
slides
 
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
 
On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 

Similar to Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis

Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
TanujaSomvanshi1
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
PrashantYadav931011
 
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree AlgorithmWater Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
IRJET Journal
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiers
amreshkr19
 
Lecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.pptLecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.ppt
Prabin Pandit
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
Soojung Hong
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Till Blume
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
iamultapromax
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
19445KNithinbabu
 
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutionDA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
gitikasingh2004
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
Katy Allen
 
8.unit-1-fds-2022-23.pptx
8.unit-1-fds-2022-23.pptx8.unit-1-fds-2022-23.pptx
8.unit-1-fds-2022-23.pptx
RavishankarBhaganaga
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
DheerajPachauri
 
Qda ces 2013 toronto workshop
Qda ces 2013 toronto workshopQda ces 2013 toronto workshop
Qda ces 2013 toronto workshop
CesToronto
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Subrat Panda, PhD
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
The Statistical and Applied Mathematical Sciences Institute
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
Shesha R
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
Dr. Radhey Shyam
 

Similar to Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis (20)

Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree AlgorithmWater Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiers
 
Lecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.pptLecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.ppt
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutionDA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
8.unit-1-fds-2022-23.pptx
8.unit-1-fds-2022-23.pptx8.unit-1-fds-2022-23.pptx
8.unit-1-fds-2022-23.pptx
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
 
Qda ces 2013 toronto workshop
Qda ces 2013 toronto workshopQda ces 2013 toronto workshop
Qda ces 2013 toronto workshop
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 

More from Olga Scrivner

Engaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptxEngaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptx
Olga Scrivner
 
HICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning TechnologiesHICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning Technologies
Olga Scrivner
 
The power of unstructured data: Recommendation systems
The power of unstructured data: Recommendation systemsThe power of unstructured data: Recommendation systems
The power of unstructured data: Recommendation systems
Olga Scrivner
 
Cognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use DisorderCognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use Disorder
Olga Scrivner
 
Introduction to Web Scraping with Python
Introduction to Web Scraping with PythonIntroduction to Web Scraping with Python
Introduction to Web Scraping with Python
Olga Scrivner
 
Call for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and TechnologyCall for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and Technology
Olga Scrivner
 
Jupyter machine learning crash course
Jupyter machine learning crash courseJupyter machine learning crash course
Jupyter machine learning crash course
Olga Scrivner
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash course
Olga Scrivner
 
The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...
Olga Scrivner
 
If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...
Olga Scrivner
 
Introduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web ApplicationIntroduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web Application
Olga Scrivner
 
Introduction to Overleaf Workshop
Introduction to Overleaf WorkshopIntroduction to Overleaf Workshop
Introduction to Overleaf Workshop
Olga Scrivner
 
R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303
Olga Scrivner
 
Gender Disparity in Employment and Education
Gender Disparity in Employment and EducationGender Disparity in Employment and Education
Gender Disparity in Employment and Education
Olga Scrivner
 
CrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for BeginnersCrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for Beginners
Olga Scrivner
 
Optimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with ShinyOptimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with Shiny
Olga Scrivner
 
Data Analysis and Visualization: R Workflow
Data Analysis and Visualization: R WorkflowData Analysis and Visualization: R Workflow
Data Analysis and Visualization: R Workflow
Olga Scrivner
 
Reproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid dataReproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid data
Olga Scrivner
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVF
Olga Scrivner
 
Building Shiny Application Series - Layout and HTML
Building Shiny Application Series - Layout and HTMLBuilding Shiny Application Series - Layout and HTML
Building Shiny Application Series - Layout and HTML
Olga Scrivner
 

More from Olga Scrivner (20)

Engaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptxEngaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptx
 
HICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning TechnologiesHICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning Technologies
 
The power of unstructured data: Recommendation systems
The power of unstructured data: Recommendation systemsThe power of unstructured data: Recommendation systems
The power of unstructured data: Recommendation systems
 
Cognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use DisorderCognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use Disorder
 
Introduction to Web Scraping with Python
Introduction to Web Scraping with PythonIntroduction to Web Scraping with Python
Introduction to Web Scraping with Python
 
Call for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and TechnologyCall for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and Technology
 
Jupyter machine learning crash course
Jupyter machine learning crash courseJupyter machine learning crash course
Jupyter machine learning crash course
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash course
 
The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...
 
If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...
 
Introduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web ApplicationIntroduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web Application
 
Introduction to Overleaf Workshop
Introduction to Overleaf WorkshopIntroduction to Overleaf Workshop
Introduction to Overleaf Workshop
 
R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303
 
Gender Disparity in Employment and Education
Gender Disparity in Employment and EducationGender Disparity in Employment and Education
Gender Disparity in Employment and Education
 
CrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for BeginnersCrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for Beginners
 
Optimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with ShinyOptimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with Shiny
 
Data Analysis and Visualization: R Workflow
Data Analysis and Visualization: R WorkflowData Analysis and Visualization: R Workflow
Data Analysis and Visualization: R Workflow
 
Reproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid dataReproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid data
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVF
 
Building Shiny Application Series - Layout and HTML
Building Shiny Application Series - Layout and HTMLBuilding Shiny Application Series - Layout and HTML
Building Shiny Application Series - Layout and HTML
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 

Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis

  • 1. Language Variation Suite Toolkit Olga Scrivner, Rafael Orozco, Manuel Díaz-Campos NWAV47, October 18, 2018 Indiana University and Louisiana State University
  • 3. What You Will Learn Part 1 Cross-disciplinary Methods in Sociolinguistics Part 2 Web Interface - a novel approach for interactive socio-analysis Part 3 Working with Structured Data: LVS Part 4 Working with Unstructured Data: ITMS 2
  • 4. Our Materials - Web Site https://languagevariationsuite. wordpress.com/ 3
  • 5. What We Need - Computer - WiFi - Smartphone (optional) - R and Rstudio (not really) 4
  • 6. What We Need - Computer - WiFi - Smartphone (optional) - R and Rstudio (not really) 4
  • 8. Our Objectives Goal: Optimize sociolinguistic analysis by using a variety of cross-disciplinary methods 1. Introduce a novel (socio-)linguistic toolkit for research and as a teaching tool 2. Develop practical skills via interactive and visual interface 3. Understand and interpret advanced statistical and data mining models 6
  • 11. Cross-disciplinary Methods: A Closer Look Random Forest - a more stable than stepwise variable selection (Tagliamonte and Baayen,2012, 25) Conditional Tree - a hierarchical tree of significant factors and their relationship with other predictors 9
  • 12. Cross-disciplinary Methods: A Closer Look Clustering - a grouping of variables or individuals or documents (Gries and Hilpert, 2008, 69) Topic Modeling - a discovery of language patterns and word usage over time and across genres (Blei, 2012, 78) 10
  • 13. How To Unify All Methods: Data Science Workflow 1. Import data into R: read_csv(), read_line() [even praat files!] 2. Tidy data - pre-processing 3. Transform with dplyr [select, subset, recode...] 4. Visualize with ggplot and plotly 5. Model with glm, lda, randomForest... 11 Tidyverse - a system of R packages for data manipulation, exploration, and visualization that share a common design philosophy (Wickham and Grolemund, 2017)
  • 14. Sociolinguistic Data - Data Analytics “Analytics is the critical technology needed to bring value out of data” (Anonymous) 12
  • 15. Sociolinguistic Data - Visual Analytics “The science of analytical reasoning facilitated by visual interactive interfaces” (Thomas and Cook, 2005) “Visual analytics integrates new computational and theory-based tools with innovative interactive techniques and visual representations to enable human-information discourse” (Thomas and Cook, 2005) PositionSentence p < 0.001 1 ind, pre post Heaviness p = 0.003 2 ≤ 1 > 1 Period p < 0.001 3 ≤ 1 > 1 Node 4 (n = 81) VOOV 0 0.2 0.4 0.6 0.8 1 Node 5 (n = 119) VOOV 0 0.2 0.4 0.6 0.8 1 Node 6 (n = 181) VOOV 0 0.2 0.4 0.6 0.8 1 Period p < 0.001 7 ≤ 2 > 2 Node 8 (n = 221) VOOV 0 0.2 0.4 0.6 0.8 1 Focus p < 0.001 9 cf nf Node 10 (n = 66) VOOV 0 0.2 0.4 0.6 0.8 1 Main_Verb_Structure p < 0.001 11 ACIOther, Restructuring Node 12 (n = 43) VOOV 0 0.2 0.4 0.6 0.8 1 Node 13 (n = 265) VOOV 0 0.2 0.4 0.6 0.8 1 13
  • 17. What is LVS? Language Variation Suite It is a Shiny web application designed for data analysis in sociolinguistic research. It can be used for: Processing spreadsheet data Reporting in tables and graphs Analyzing means, regression, conditional trees ... (and much more) 15
  • 18. Background LVS is built in R using Shiny package: 1. R - a free programming language for statistical computing and graphics 2. Shiny App - a web application framework for R Computational power of R + Web interactivity 16
  • 20. Workspace Browser Chrome, Firefox, Safari - recommendable Explorer may cause instability issues Accessibility PC, Mac, Linux ◦ Data files will be uploaded from any location on your computer Smart Phone ◦ Data files must be on a cloud platform connected to your phone account (e.g. dropbox) 18
  • 21. Server Since LVS is hosted on a server, Shiny idle time-out settings may stop application when it is left inactive (it will grey out). Solution: Click reload and re-upload your csv file 19
  • 23. Data Preparation Important things to consider before data entry: File format: ◦ Comma separated value (CSV) - faster processing ◦ Excel format will slow processing Column names should not contain spaces ◦ Permitted: non-accented characters, numbers, underscore, hyphen, and period One column must contain your dependent variable The rest of the columns contain independent variables 21
  • 24. Terminology Review a. Categorical/Nominal - non-numerical data with two values ◦ yes - no; deletion - retention; perfective - imperfective b. Continuous - numerical data ◦ duration, age, chronological period c. Multinomial - non-numerical data with three or more values ◦ deletion - aspiration - retention d. Ordinal - scale 22
  • 25. Test Types Dependent Independent Test Continuous Continous t-test Categorical Categorical Chi-square Continuous Multiple Categorical/ Linear Regression Continuous Categorical Multiple Categorical/ Logistic Regression Continuous Binary or Multinomial (Dickinson, 2014) 23
  • 28. Workshop Files https://languagevariationsuite.wordpress.com/ 1. categoricaldata.csv: categorical dependent - Labov New York 1966 study 2. continuousdata.csv: continuous dependent - Intervocalic /d/ in Caracas corpus (Díaz-Campos and Gradoville, 2011; Scrivner and Díaz-Campos, 2016) 3. LVS web site: www.languagevariationsuite.com 25
  • 30. Language Variation Suite - Structure 1. Data ◦ Upload file, data summary, adjust data, cross tabulation 2. Visual Analysis ◦ Plotting, cluster classification 3. Inferential Statistics ◦ Modeling, regression, conditional trees, random forest 27
  • 31. Language Variation Suite - Structure 1. Data ◦ Upload file, data summary, adjust data, cross tabulation 2. Visual Analysis ◦ Plotting, cluster classification 3. Inferential Statistics ◦ Modeling, regression, conditional trees, random forest 27
  • 33. Excel Format 1. Slow processing 2. Requires the name of your excel sheet 29
  • 34. TIPS: Save Your Excel as CSV Format To optimize speed - Save as CSV prior upload 30
  • 36. Uploaded Dataset The data content is imported as a table and allows for sorting columns. 32
  • 37. Summary Summary provides a quantitative summary for each variable, e.g. frequency count, mean, median. 33
  • 38. Data Structure 1. Total number of observations (rows) 2. Number of variables (columns) 3. Variable types ◦ Factor - categorical values ◦ Num - numeric values (0.95, 1.05) ◦ Int - integer values (1, 2, 3) 34
  • 39. Cross Tabulation Cross-tabulation examines the relationship between variables. 35
  • 41. Cross-Tabulation Output Raw frequency / Proportion by column / Proportion across row 37
  • 44. Language Variation Suite - Structure 1. Data ◦ Upload file, data summary, adjust data, cross tabulation 2. Visual Analysis ◦ Plotting, cluster classification 3. Inferential statistics ◦ Modeling, regression, varbrul analysis, conditional trees, random forest 40
  • 45. Adjusting Browser - Layout Shiny pages are fluid and reactive. 41
  • 50. Cluster Plot Classification of data into sub-groups is based on pairwise similarities Groups are clustered in the form of a tree-like dendrogram 46
  • 52. Cluster Plot Saks (upper middle-class store), Macy’s (middle-class store), Kleins (working-class) 48
  • 56. Language Variation Suite - Structure 1. Data ◦ Upload file, data summary, adjust data, cross tabulation 2. Visual Analysis ◦ Plotting, cluster classification 3. Inferential statistics ◦ Modeling, regression, varbrul analysis, conditional trees, random forest 52
  • 57. How to Create a Regression Model Step 1 Modeling - create a model with dependent and independent variables Step 2 Regression - specify the type of regression (fixed, mixed) and type of dependent variable (binary, continuous, multinomial) Step 3 Stepwise Regression - compare models (Log-likelihood, AIC, BIC) Step 4 Conditional Trees - apply non-parametric tests to the model 53
  • 59. Modeling 54 We are interested in RETENTION = Application
  • 60. Regression Types Model a.) Fixed effect b.) Mixed effect - individual speaker/token variation (within group) Type of Dependent Variable a.) Binary/categorical (only two values) b.) Continuous (numeric) c.) Multinomial - categorical with more than two values 55
  • 64. Interpretation Deletion is the reference value Positive coefficient - positive effect Negative coefficient - negative effect 58
  • 65. Interpretation - RETENTION Lexical item Fourth has a negative effect on retention compared to Floor and is significant Normal style has a slightly negative effect on retention but its coefficient is not significant Macy’s and Saks have a positive and significant effect on retention. Saks (upper middle class store) is more significant than Macy’s (middle class store) http://www.free-online-calculator-use.com/scientific-notation-converter.html 59
  • 66. Interpretation - RETENTION Lexical item Fourth has a negative effect on retention compared to Floor and is significant Normal style has a slightly negative effect on retention but its coefficient is not significant Macy’s and Saks have a positive and significant effect on retention. Saks (upper middle class store) is more significant than Macy’s (middle class store) http://www.free-online-calculator-use.com/scientific-notation-converter.html 59 exponential notation: 1.48e-8 0.0000000148
  • 68. Conditional Tree Conditional tree: a simple non-parametric regression analysis, commonly used in social and psychological studies Linear regression: all information is combined linearly Conditional tree regression: visual splitting to capture interaction between variables Recursive splitting (tree branches) 61
  • 69. Conditional Tree - Tagliamonte and Baayen (2012) 1. The distribution of was/were is split in two groups by individuals. 2. The variant were occurs significantly more frequently with the first group. 62
  • 70. Conditional Tree - Tagliamonte and Baayen (2012) 1. Polarity is relevant to the second group of individuals. 2. The variant were occurs significantly more often with negative polarity 63
  • 71. Conditional Tree - Tagliamonte and Baayen (2012) 1. Affirmative Polarity is conditioned by Age. 2. The variant was is produced significantly more often by Individuals of 46 and younger. 64
  • 73. Conditional Tree 1. Store is the most significant factor for R-use ◦ Kleins (working class store) - more R-deletion 2. R-use in Macy’s and Saks is conditioned by lexical item: ◦ Floor shows more R-retention than Fourth 3. Style is not significant 66
  • 75. Random Forest 1. Variable importance for predictors 2. Robust technique with small n large p data 3. All predictors considered jointly (allows for inclusion of correlated factors) 68
  • 77. Random Forest 1. Store is the most important predictor 2. Lexical Item is the second predictor 3. Style is irrelevant: close to zero and red dotted line (cut-off value). 70
  • 79. Fixed and Mixed Models Fixed Effects Model : All predictors are treated independent. Underlying assumption - no group-internal variation between speakers or tokens Mixed Effects Model : Allows for evaluation of individual- and group-level variation 72
  • 80. Fixed and Mixed Models Fixed Regression Model - ignoring individual variations (speakers or words) may lead to Type I Error: “a chance effect is mistaken for a real difference between the populations” Mixed Regression Model - prone to Type II Error: “if speaker variation is at a high level, we cannot discern small population effects without a large number of speakers” (Johnson 2009, 2015) 73
  • 81. Mixed Effect Regression Mixed Model = fixed effects + random effects Fixed-effect factor - “repeatable and a small number of levels” Random-effect factor - “a non-repeatable random sample from a larger population” (Wieling 2012) walk, sleep, study, finish, eat, etc event verb, stative verb speaker1, speaker3, speaker3, etc male, female 74
  • 82. Mixed Effect Regression Mixed Model = fixed effects + random effects Fixed-effect factor - “repeatable and a small number of levels” Random-effect factor - “a non-repeatable random sample from a larger population” (Wieling 2012) walk, sleep, study, finish, eat, etc event verb, stative verb speaker1, speaker3, speaker3, etc male, female 74
  • 83. Preparing for Mixed Model 1. Download continuousdata.csv 2. Upload this file on LVS 75
  • 84. Data - Uploaded Dataset 76
  • 85. Mixed Effect Modeling 77 NULL when the dependent variable is continuous Fixed Effects - independent variables
  • 86. Mixed Effect Modeling 78 Mixed Effects - group-internal variation
  • 89. Interpretation - Random Effects 1. Standard Deviation: a measure of the variability for each random effect (speakers and tokens) 2. Residual: random variation that is not due to speakers or tokens (residual error) 80
  • 90. Interpretation - Fixed Effects 1. Estimate/coefficient: reported in log-odds (negative or positive) 2. P-value: tells you if the level is significant 81
  • 91. Bonus - Word Clouds 82
  • 95. Publications - Scrivner, Olga., and Díaz-Campos, M. 2016. Language Variation Suite: A theoretical and methodological contribution for linguistic data analysis. Proceedings of the Linguistic Society of America, 1, 29–1 - Estrada, Monica. 2016. El tú no es de nosotros, es de otros países: Usos del voseo y actitudes hacia él en el castellano hondureño. Master Thesis. Louisiana State University - Orozco, Rafael. 2018. El castellano del Caribe colombiano en la ciudad de Nueva York: El uso variable d sujetos pronominales. In Studies in Hispanic and Lusophone Linguistics 11(1): 89–129 86
  • 96. Students Perspective: Feedback “This program makes analysis so much more accessible to someone like myself who is new to statistical analysis” “I realize it is actually a tool of analysis every linguist will appreciate a lot for quantitative analysis” 87
  • 98. References I Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge: Cambridge University Press Bentivoglio, Paola and Mercedes Sedano. 1993. Investigación sociolingüística: sus métodos aplicados a una experiencia venezolana. Boletín de Lingüística 8. 3-35 Díaz-Campos, Manuel and Stephanie Dickinson. In press. Using Statistics as a Tool in the Analysis of Sociolinguistic Variation: A comparison of current and traditional methods. In Fernando Tejedo-Herrero (Ed.), Lusophone, Galician, and Hispanic Linguistics: Bridging Frames and Traditions. Amsterdam: John Benjamins Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber Randi Reppen (eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press Labov, W. 1966. The Social Stratification of English in New York City. Washington: Center for Applied Linguistics Scrivner, Olga., and Díaz-Campos, M. 2016. Language Variation Suite: A theoretical and methodological contribution for linguistic data analysis. Proceedings of the Linguistic Society of America, 1, 29–1 Strobl, Carolin, James Malley, and Gerhard Tutz. An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. 2009. Psychological methods 14: 323–348. Tagliamonte, Sali, and R. Harald Baayen. 2012. Models, forests and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24: 135–178. Tagliamonte, Sali. 2016. Quantitative analysis in language variation and change. In Fernando Tejedo-Herrero and Sandro Sessarego (Eds.), Spanish Language and Sociolinguistic Analysis. Amsterdam: John Benjamins 89
  • 101. Appendix 1: Density 92 Number of bins can have a disproportionate effect on visualization
  • 102. Histogram Density: a non-parametric model of the distribution of points based on a smooth density estimate http://scikit-learn.org/stable/modules/density.html 93
  • 103. Appenix 2 - Data Modification 94
  • 104. Adjust Data Retain: Select data subset Exclude: Exclude variables from a factor group Recode: Combine and rename variables Change class: Numeric → factor; factor → numeric Transform: Apply log transformation to a specific column ADJUSTED DATASET: ◦ Run - to apply all above changes ◦ Reset - to reset to the original dataset 95
  • 107. Adjusting Dataset To revert to the original data, select RESET: 98
  • 108. Appendix 3 - Model Comparison 99