SlideShare a Scribd company logo
R in the Humanities: Text Analysis
Dr Leah Henrickson
Lecturer in Digital Media
School of Media and Communication
University of Leeds
L.R.Henrickson@leeds.ac.uk
twitter.com/leahhenrickson
Who am I?
• A Lecturer in Digital Media
• A book historian
• A digital humanist
• Canadian 🍁
L.R.Henrickson@leeds.ac.uk
twitter.com/leahhenrickson
R in the Humanities: Text Analysis
Session 1:
Gettin’ to Grips with R
CC Image: https://en.wikipedia.org/wiki/File:Piratey,_vector_version.svg
Overview
This course is a gentle introduction to R for text analysis. Over the course of two sessions you will be taught the basics of the
powerful programming language before being provided with hands-on experience analysing long-form text in the RStudio
development environment.
By the end of the course, you will be able to:
• Navigate the RStudio development environment
• Prepare long-form prose texts for computational analysis using R
• Conduct basic computational analyses of long-form prose texts
• Construct and explain visualisations of computed results
• Critically apply computational text analysis to complement other analytical methods
To complete this course you will need to install:
• R version 3.6 or higher (download at https://www.r-project.org)
• RStudio Desktop: Open Source Edition 1.2 or higher (download at https://www.rstudio.com/products/rstudio)
Session 1 Agenda
1. What are R and RStudio?
2. What can R help you do?
3. A quick note about Computational Literary Analysis
4. Getting started with R
5. Cleaning text
CC Image: https://pixabay.com/photos/dog-laptop-computer-glasses-2983021
What are R and RStudio?
R is:
• a programming language
• a software environment
• a really fancy calculator
• free/open source
Download: https://cran.r-project.org/mirrors.html
RStudio is:
• an integrated development environment (IDE)
• a great way to make your coding experiences easier, more colourful,
and more fun!
Download: https://www.rstudio.com/products/rstudio/download
What can R help you do?
• Count words
• Find linguistic patterns within and across texts
• Compare texts
• Make pretty pictures
But it’s still up to you to explain results.
Also, is R always the most appropriate tool?
CC Image: https://pixabay.com/photos/letters-tiles-word-game-crossword-4938486
A quick note about Computational Literary
Analysis (CLS)
CLS has a long history (for example, Father Robert Busa, ~1940s),
but has been criticised for:
• Misinterpretation of statistical data (Da)
• Unchecked enthusiasm for technological ‘hype’ (Kirsch)
• Turning literature into data and neglecting reception of works
(Marche)
Da, Nan Z. “The Computational Case against Computational Literary Studies.” Critical Inquiry, vol. 45, 2019,
pp. 601-639.
Kirsch, Adam. “Technology Is Taking Over English Departments.” The New Republic, 2014,
https://newrepublic.com/article/117428/limits-digital-humanities-adam-kirsch. Accessed 21 December 2020.
Marche, Stephen. “Literature Is not Data: Against Digital Humanities.” The Los Angeles Review of Books,
2012, https://lareviewofbooks.org/article/literature-is-not-data-against-digital-humanities. Accessed 21
December 2020.
CC Image: https://melissaterras.org/2013/10/15/for-ada-lovelace-day-father-busas-female-punch-card-operatives
Let’s get started!
Double click ‘Terminal’.
Terminal (write your script)
Console (run your script)
Environment (your data)
Everything else!
The Basics (1/2)
Calculating
• 10 + 2 (spaces optional)
• 10 – 2
• 10 * 2
• 10 / 2
Strings and Things
• 1:50
• print(“Hello world!!”)
• [variable name] <- c(1, 2, 3)
• [variable name][2]
Meme: https://knowyourmeme.com/memes/math-lady-confused-lady
The Basics (2/2)
• Data types: character, numeric, integer, logical, complex
• Data structures: vector, list, matrix, data frame, factors
• Keep notes using #
• Need help?
• ?____________
• help()
• install.packages(“[name of package]”)
Meme: https://www.reddit.com/r/ProgrammerHumor/comments/8w54mx/code_comments_be_like
Tools > Global Options >
Appearance
(You will need to restart
RStudio to apply these
changes).
Let’s clean some text!
CC Image: https://thenounproject.com/term/cleaning/199037
You can use whatever corpus you’d like for this course.
However, I have prepared a corpus of six texts for you. You may download the corpus at http://tinyurl.com/n8texts.
This corpus includes six public domain texts (1870-1914) about the women’s suffrage movement in the United States and the
United Kingdom:
• debate: Debate on Woman Suffrage in the Senate of the United States (https://www.gutenberg.org/ebooks/11114)
• femalesuffrage: Female Suffrage: A Letter to the Christian Women of America, Susan Fenimore Cooper
(https://www.gutenberg.org/ebooks/2157)
• myownstory: My Own Story, Emmeline Pankhurst (https://www.gutenberg.org/ebooks/34856)
• republic: Woman and the Republic, Helen Kendrick Johnson (https://www.gutenberg.org/ebooks/7300)
• unexpurgated: The Unexpurgated Case Against Woman Suffrage, Almroth Wright
(https://www.gutenberg.org/ebooks/5183)
First, set your working directory: Session > Set Working Directory > Choose Directory > [folder]
install.packages(“tm”)
library(tm)
getwd()
texts <- Corpus(DirSource(“[path to working directory]”)
writeLines(as.character(texts[[4]])
?tm_map
getTransformations()
texts1 <- tm_map(texts, removePunctuation)
texts2 <- tm_map(texts1, removeNumbers)
texts3 <- tm_map(texts2, content_transformer(tolower))
texts4 <- tm_map(texts3, removeWords, stopwords(“english”))
texts_final <- tm_map(texts4, stripWhitespace)
writeLines(as.character(texts_final[[4]])
dtm <- DocumentTermMatrix(texts_final)
Help me! (1/3)
R Communities
#rstats (Twitter): https://twitter.com/hashtag/rstats
Forwards: https://forwards.github.io
R-Bloggers: https://www.r-bloggers.com
R-Ladies: https://rladies.org
r/rstats: https://www.reddit.com/r/rstats
RStudio Community: https://community.rstudio.com
Stack Overflow: https://stackoverflow.com/questions/tagged/r
Help me! (2/3)
R Resources
Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014)
https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/
LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r
Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r
W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r-
release/R-intro.pdf
Help me! (3/3)
R Packages for Text Analysis
corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools
gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr
quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html
stylo (stylometry): https://cran.r-project.org/web/packages/stylo
syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html
tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext
tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
If you’re interested in stylometry specifically…
The Digital Humanities Summer Institute is offering its annual ‘Stylometry with R’
workshop FREE and ASYNCHRONOUSLY this year (14-18 June 2021)!
Details and registration at https://dhsi.org/dhsi-2021-online-edition/dhsi-2021-online-
edition-workshops.
Session 2:
Charts, Clouds, and Confidence
Image: https://pixabay.com/illustrations/rainbow-cloud-sunset-colorful-sky-5389074/
Session 2 Agenda
1. Any questions from last week?
2. Review of last week’s session (i.e. cleaning text)
3. Counting words
4. Plotting results
5. Making word clouds
6. Wrapping up
CC Images: https://thenounproject.com/term/graph/21394; https://thenounproject.com/term/word-cloud/195993
First, set your working directory: Session > Set Working Directory > Choose Directory > [folder]
install.packages(“tm”)
library(tm)
getwd()
texts <- Corpus(DirSource(“[path to working directory]”)
writeLines(as.character(texts[[4]])
?tm_map
getTransformations()
texts1 <- tm_map(texts, removePunctuation)
texts2 <- tm_map(texts1, removeNumbers)
texts3 <- tm_map(texts2, content_transformer(tolower))
texts4 <- tm_map(texts3, removeWords, stopwords(“english”))
texts_final <- tm_map(texts4, stripWhitespace)
writeLines(as.character(texts_final[[4]])
dtm <- DocumentTermMatrix(texts_final)
Getting word frequencies and associations:
freq <- colSums(as.matrix(dtm))
freq[1:10]
freq_d <- sort(freq, decreasing=TRUE)
freq_d[1:10]
findFreqTerms(dtm, lowfreq=100)
findAssocs(dtm, "women", 0.95)
?findAssocs
Making a bar chart (and then making it look nice):
barplot(freq_d[1:10])
?barplot
install.packages("RColorBrewer")
library(RColorBrewer)
?RColorBrewer
display.brewer.all|)
cols <- brewer.pal(8, "Spectral")
barplot(freq_d[1:10], col=cols, main="My Cool Plot", xlab="Word", ylab="Instances")
Making a word cloud (and then making it look nice):
install.packages("wordcloud")
library(wordcloud)
matrix <- as.matrix(dtm)
words <- sort(colSums(matrix), decreasing=TRUE)
df <- data.frame(word=names(words), freq=words)
?data.frame
wordcloud(words=df$word, freq=df$freq, max.words=100, random.order=FALSE, col=cols)
?wordcloud
Help me! (1/3)
R Communities
#rstats (Twitter): https://twitter.com/hashtag/rstats
Forwards: https://forwards.github.io
R-Bloggers: https://www.r-bloggers.com
R-Ladies: https://rladies.org
r/rstats: https://www.reddit.com/r/rstats
RStudio Community: https://community.rstudio.com
Stack Overflow: https://stackoverflow.com/questions/tagged/r
Help me! (2/3)
R Resources
Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014)
https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/
LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r
Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r
W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r-
release/R-intro.pdf
Help me! (3/3)
R Packages for Text Analysis
corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools
gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr
quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html
stylo (stylometry): https://cran.r-project.org/web/packages/stylo
syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html
tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext
tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
If you’re interested in stylometry specifically…
The Digital Humanities Summer Institute is offering its annual ‘Stylometry with R’
workshop FREE and ASYNCHRONOUSLY this year (14-18 June 2021)!
Details and registration at https://dhsi.org/dhsi-2021-online-edition/dhsi-2021-online-
edition-workshops.
Thank you!
Dr Leah Henrickson
Lecturer in Digital Media
School of Media and Communication
University of Leeds
L.R.Henrickson@leeds.ac.uk
twitter.com/leahhenrickson

More Related Content

Similar to R in the Humanities: Text Analysis

Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
Luke Czarnecki
 
Digital humanities
Digital humanitiesDigital humanities
Digital humanities
Mokhtar Ben Henda
 
Digital Research at the British Library, by Stella Wisdom
Digital Research at the British Library, by Stella WisdomDigital Research at the British Library, by Stella Wisdom
Digital Research at the British Library, by Stella Wisdom
Digital Research and Curator Team @ British Library
 
AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101  AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101
Digital Research and Curator Team @ British Library
 
The Virtual Research Environment and Libraries
The Virtual Research Environment and LibrariesThe Virtual Research Environment and Libraries
The Virtual Research Environment and Libraries
National Information Standards Organization (NISO)
 
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Paige Morgan
 
Four Corners of the Big Tent
Four Corners of the Big TentFour Corners of the Big Tent
Four Corners of the Big Tent
John Bradley
 
Forty Years of the OTA
Forty Years of the OTAForty Years of the OTA
Forty Years of the OTA
Martin Wynne
 
Cultural text mining workshop
Cultural text mining workshopCultural text mining workshop
Cultural text mining workshop
Pim Huijnen
 
NECTAR_VRE1
NECTAR_VRE1NECTAR_VRE1
NECTAR_VRE1
Craig Bellamy
 
The World of Digital Humanities : Digital Humanities in the World
The World of Digital Humanities : Digital Humanities in the WorldThe World of Digital Humanities : Digital Humanities in the World
The World of Digital Humanities : Digital Humanities in the World
Edward Vanhoutte
 
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
OpenEdition
 
Dh presentation 2018
Dh presentation 2018Dh presentation 2018
Dh presentation 2018
University of Cape Town
 
Digital Humanities Research
Digital Humanities ResearchDigital Humanities Research
Digital Humanities Research
elli.m
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities research
Enrico Daga
 
In want of a dataset: Text Analysis and the VRC, Catherine D. Adams
In want of a dataset: Text Analysis and the VRC, Catherine D. AdamsIn want of a dataset: Text Analysis and the VRC, Catherine D. Adams
In want of a dataset: Text Analysis and the VRC, Catherine D. Adams
Visual Resources Association
 
DHI2018 - a comparative study of Chinese and English publications
DHI2018 - a comparative study of Chinese and English publicationsDHI2018 - a comparative study of Chinese and English publications
DHI2018 - a comparative study of Chinese and English publications
Jin Gao
 
Digital Humanities for Historians: An introduction
Digital Humanities for Historians: An introductionDigital Humanities for Historians: An introduction
Digital Humanities for Historians: An introduction
librarianrafia
 
Doing DH in Theological Libraries
Doing DH in Theological LibrariesDoing DH in Theological Libraries
Doing DH in Theological Libraries
Clifford Anderson
 

Similar to R in the Humanities: Text Analysis (20)

Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
Digital Humanities Workshop
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
Digital humanities
Digital humanitiesDigital humanities
Digital humanities
 
Digital Research at the British Library, by Stella Wisdom
Digital Research at the British Library, by Stella WisdomDigital Research at the British Library, by Stella Wisdom
Digital Research at the British Library, by Stella Wisdom
 
AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101  AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101
 
The Virtual Research Environment and Libraries
The Virtual Research Environment and LibrariesThe Virtual Research Environment and Libraries
The Virtual Research Environment and Libraries
 
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
 
Four Corners of the Big Tent
Four Corners of the Big TentFour Corners of the Big Tent
Four Corners of the Big Tent
 
Forty Years of the OTA
Forty Years of the OTAForty Years of the OTA
Forty Years of the OTA
 
Cultural text mining workshop
Cultural text mining workshopCultural text mining workshop
Cultural text mining workshop
 
NECTAR_VRE1
NECTAR_VRE1NECTAR_VRE1
NECTAR_VRE1
 
The World of Digital Humanities : Digital Humanities in the World
The World of Digital Humanities : Digital Humanities in the WorldThe World of Digital Humanities : Digital Humanities in the World
The World of Digital Humanities : Digital Humanities in the World
 
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
 
Dh presentation 2018
Dh presentation 2018Dh presentation 2018
Dh presentation 2018
 
Digital Humanities Research
Digital Humanities ResearchDigital Humanities Research
Digital Humanities Research
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities research
 
In want of a dataset: Text Analysis and the VRC, Catherine D. Adams
In want of a dataset: Text Analysis and the VRC, Catherine D. AdamsIn want of a dataset: Text Analysis and the VRC, Catherine D. Adams
In want of a dataset: Text Analysis and the VRC, Catherine D. Adams
 
DHI2018 - a comparative study of Chinese and English publications
DHI2018 - a comparative study of Chinese and English publicationsDHI2018 - a comparative study of Chinese and English publications
DHI2018 - a comparative study of Chinese and English publications
 
Digital Humanities for Historians: An introduction
Digital Humanities for Historians: An introductionDigital Humanities for Historians: An introduction
Digital Humanities for Historians: An introduction
 
Doing DH in Theological Libraries
Doing DH in Theological LibrariesDoing DH in Theological Libraries
Doing DH in Theological Libraries
 

More from Leah Henrickson

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Leah Henrickson
 
Versions of Intimacy: Talking To and About CarynAI
Versions of Intimacy: Talking To and About CarynAIVersions of Intimacy: Talking To and About CarynAI
Versions of Intimacy: Talking To and About CarynAI
Leah Henrickson
 
Digital Storytelling for Collaborative Scholarship
Digital Storytelling for Collaborative ScholarshipDigital Storytelling for Collaborative Scholarship
Digital Storytelling for Collaborative Scholarship
Leah Henrickson
 
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Leah Henrickson
 
Chatting with Computers
Chatting with ComputersChatting with Computers
Chatting with Computers
Leah Henrickson
 
Between Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Between Hermeneutics and Deceit: Keeping Natural Language Generation in LineBetween Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Between Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Leah Henrickson
 
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Leah Henrickson
 
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Leah Henrickson
 
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
Leah Henrickson
 
Telling Your Story for Effect and Affect
Telling Your Story for Effect and AffectTelling Your Story for Effect and Affect
Telling Your Story for Effect and Affect
Leah Henrickson
 
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
Leah Henrickson
 
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated TextsFunny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Leah Henrickson
 
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data FuzzinessLet's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Leah Henrickson
 
'Education Espresso: Changing Assessment' Panelist Self-Introduction
'Education Espresso: Changing Assessment' Panelist Self-Introduction'Education Espresso: Changing Assessment' Panelist Self-Introduction
'Education Espresso: Changing Assessment' Panelist Self-Introduction
Leah Henrickson
 
Grieving via GPT: Circling Around Cadaverous Chatbots
Grieving via GPT: Circling Around Cadaverous ChatbotsGrieving via GPT: Circling Around Cadaverous Chatbots
Grieving via GPT: Circling Around Cadaverous Chatbots
Leah Henrickson
 
Achieving Success in an Interdisciplinary Team
Achieving Success in an Interdisciplinary TeamAchieving Success in an Interdisciplinary Team
Achieving Success in an Interdisciplinary Team
Leah Henrickson
 
Reading Computer-Generated Books: Artificial Versifying
Reading Computer-Generated Books: Artificial VersifyingReading Computer-Generated Books: Artificial Versifying
Reading Computer-Generated Books: Artificial Versifying
Leah Henrickson
 
Writing AI: Public (Mis)Perceptions of Algorithmic Authorship
Writing AI: Public (Mis)Perceptions of Algorithmic AuthorshipWriting AI: Public (Mis)Perceptions of Algorithmic Authorship
Writing AI: Public (Mis)Perceptions of Algorithmic Authorship
Leah Henrickson
 
The #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
The #PandemicReading Aesthetic: A Photo Essay of Quarantine ReadingThe #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
The #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
Leah Henrickson
 
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Leah Henrickson
 

More from Leah Henrickson (20)

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Versions of Intimacy: Talking To and About CarynAI
Versions of Intimacy: Talking To and About CarynAIVersions of Intimacy: Talking To and About CarynAI
Versions of Intimacy: Talking To and About CarynAI
 
Digital Storytelling for Collaborative Scholarship
Digital Storytelling for Collaborative ScholarshipDigital Storytelling for Collaborative Scholarship
Digital Storytelling for Collaborative Scholarship
 
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
 
Chatting with Computers
Chatting with ComputersChatting with Computers
Chatting with Computers
 
Between Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Between Hermeneutics and Deceit: Keeping Natural Language Generation in LineBetween Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Between Hermeneutics and Deceit: Keeping Natural Language Generation in Line
 
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
 
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
 
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
 
Telling Your Story for Effect and Affect
Telling Your Story for Effect and AffectTelling Your Story for Effect and Affect
Telling Your Story for Effect and Affect
 
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
 
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated TextsFunny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
 
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data FuzzinessLet's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
 
'Education Espresso: Changing Assessment' Panelist Self-Introduction
'Education Espresso: Changing Assessment' Panelist Self-Introduction'Education Espresso: Changing Assessment' Panelist Self-Introduction
'Education Espresso: Changing Assessment' Panelist Self-Introduction
 
Grieving via GPT: Circling Around Cadaverous Chatbots
Grieving via GPT: Circling Around Cadaverous ChatbotsGrieving via GPT: Circling Around Cadaverous Chatbots
Grieving via GPT: Circling Around Cadaverous Chatbots
 
Achieving Success in an Interdisciplinary Team
Achieving Success in an Interdisciplinary TeamAchieving Success in an Interdisciplinary Team
Achieving Success in an Interdisciplinary Team
 
Reading Computer-Generated Books: Artificial Versifying
Reading Computer-Generated Books: Artificial VersifyingReading Computer-Generated Books: Artificial Versifying
Reading Computer-Generated Books: Artificial Versifying
 
Writing AI: Public (Mis)Perceptions of Algorithmic Authorship
Writing AI: Public (Mis)Perceptions of Algorithmic AuthorshipWriting AI: Public (Mis)Perceptions of Algorithmic Authorship
Writing AI: Public (Mis)Perceptions of Algorithmic Authorship
 
The #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
The #PandemicReading Aesthetic: A Photo Essay of Quarantine ReadingThe #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
The #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
 
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
 

Recently uploaded

Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
Ivanti
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 

Recently uploaded (20)

Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 

R in the Humanities: Text Analysis

  • 1. R in the Humanities: Text Analysis Dr Leah Henrickson Lecturer in Digital Media School of Media and Communication University of Leeds L.R.Henrickson@leeds.ac.uk twitter.com/leahhenrickson
  • 2. Who am I? • A Lecturer in Digital Media • A book historian • A digital humanist • Canadian 🍁 L.R.Henrickson@leeds.ac.uk twitter.com/leahhenrickson
  • 4. Session 1: Gettin’ to Grips with R CC Image: https://en.wikipedia.org/wiki/File:Piratey,_vector_version.svg
  • 5. Overview This course is a gentle introduction to R for text analysis. Over the course of two sessions you will be taught the basics of the powerful programming language before being provided with hands-on experience analysing long-form text in the RStudio development environment. By the end of the course, you will be able to: • Navigate the RStudio development environment • Prepare long-form prose texts for computational analysis using R • Conduct basic computational analyses of long-form prose texts • Construct and explain visualisations of computed results • Critically apply computational text analysis to complement other analytical methods To complete this course you will need to install: • R version 3.6 or higher (download at https://www.r-project.org) • RStudio Desktop: Open Source Edition 1.2 or higher (download at https://www.rstudio.com/products/rstudio)
  • 6. Session 1 Agenda 1. What are R and RStudio? 2. What can R help you do? 3. A quick note about Computational Literary Analysis 4. Getting started with R 5. Cleaning text CC Image: https://pixabay.com/photos/dog-laptop-computer-glasses-2983021
  • 7. What are R and RStudio? R is: • a programming language • a software environment • a really fancy calculator • free/open source Download: https://cran.r-project.org/mirrors.html RStudio is: • an integrated development environment (IDE) • a great way to make your coding experiences easier, more colourful, and more fun! Download: https://www.rstudio.com/products/rstudio/download
  • 8. What can R help you do? • Count words • Find linguistic patterns within and across texts • Compare texts • Make pretty pictures But it’s still up to you to explain results. Also, is R always the most appropriate tool? CC Image: https://pixabay.com/photos/letters-tiles-word-game-crossword-4938486
  • 9. A quick note about Computational Literary Analysis (CLS) CLS has a long history (for example, Father Robert Busa, ~1940s), but has been criticised for: • Misinterpretation of statistical data (Da) • Unchecked enthusiasm for technological ‘hype’ (Kirsch) • Turning literature into data and neglecting reception of works (Marche) Da, Nan Z. “The Computational Case against Computational Literary Studies.” Critical Inquiry, vol. 45, 2019, pp. 601-639. Kirsch, Adam. “Technology Is Taking Over English Departments.” The New Republic, 2014, https://newrepublic.com/article/117428/limits-digital-humanities-adam-kirsch. Accessed 21 December 2020. Marche, Stephen. “Literature Is not Data: Against Digital Humanities.” The Los Angeles Review of Books, 2012, https://lareviewofbooks.org/article/literature-is-not-data-against-digital-humanities. Accessed 21 December 2020. CC Image: https://melissaterras.org/2013/10/15/for-ada-lovelace-day-father-busas-female-punch-card-operatives
  • 12. Terminal (write your script) Console (run your script) Environment (your data) Everything else!
  • 13. The Basics (1/2) Calculating • 10 + 2 (spaces optional) • 10 – 2 • 10 * 2 • 10 / 2 Strings and Things • 1:50 • print(“Hello world!!”) • [variable name] <- c(1, 2, 3) • [variable name][2] Meme: https://knowyourmeme.com/memes/math-lady-confused-lady
  • 14. The Basics (2/2) • Data types: character, numeric, integer, logical, complex • Data structures: vector, list, matrix, data frame, factors • Keep notes using # • Need help? • ?____________ • help() • install.packages(“[name of package]”) Meme: https://www.reddit.com/r/ProgrammerHumor/comments/8w54mx/code_comments_be_like
  • 15. Tools > Global Options > Appearance (You will need to restart RStudio to apply these changes).
  • 16. Let’s clean some text! CC Image: https://thenounproject.com/term/cleaning/199037
  • 17. You can use whatever corpus you’d like for this course. However, I have prepared a corpus of six texts for you. You may download the corpus at http://tinyurl.com/n8texts. This corpus includes six public domain texts (1870-1914) about the women’s suffrage movement in the United States and the United Kingdom: • debate: Debate on Woman Suffrage in the Senate of the United States (https://www.gutenberg.org/ebooks/11114) • femalesuffrage: Female Suffrage: A Letter to the Christian Women of America, Susan Fenimore Cooper (https://www.gutenberg.org/ebooks/2157) • myownstory: My Own Story, Emmeline Pankhurst (https://www.gutenberg.org/ebooks/34856) • republic: Woman and the Republic, Helen Kendrick Johnson (https://www.gutenberg.org/ebooks/7300) • unexpurgated: The Unexpurgated Case Against Woman Suffrage, Almroth Wright (https://www.gutenberg.org/ebooks/5183)
  • 18. First, set your working directory: Session > Set Working Directory > Choose Directory > [folder] install.packages(“tm”) library(tm) getwd() texts <- Corpus(DirSource(“[path to working directory]”) writeLines(as.character(texts[[4]]) ?tm_map getTransformations() texts1 <- tm_map(texts, removePunctuation) texts2 <- tm_map(texts1, removeNumbers) texts3 <- tm_map(texts2, content_transformer(tolower)) texts4 <- tm_map(texts3, removeWords, stopwords(“english”)) texts_final <- tm_map(texts4, stripWhitespace) writeLines(as.character(texts_final[[4]]) dtm <- DocumentTermMatrix(texts_final)
  • 19. Help me! (1/3) R Communities #rstats (Twitter): https://twitter.com/hashtag/rstats Forwards: https://forwards.github.io R-Bloggers: https://www.r-bloggers.com R-Ladies: https://rladies.org r/rstats: https://www.reddit.com/r/rstats RStudio Community: https://community.rstudio.com Stack Overflow: https://stackoverflow.com/questions/tagged/r
  • 20. Help me! (2/3) R Resources Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014) https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/ LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r- release/R-intro.pdf
  • 21. Help me! (3/3) R Packages for Text Analysis corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html stylo (stylometry): https://cran.r-project.org/web/packages/stylo syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
  • 22. If you’re interested in stylometry specifically… The Digital Humanities Summer Institute is offering its annual ‘Stylometry with R’ workshop FREE and ASYNCHRONOUSLY this year (14-18 June 2021)! Details and registration at https://dhsi.org/dhsi-2021-online-edition/dhsi-2021-online- edition-workshops.
  • 23. Session 2: Charts, Clouds, and Confidence Image: https://pixabay.com/illustrations/rainbow-cloud-sunset-colorful-sky-5389074/
  • 24. Session 2 Agenda 1. Any questions from last week? 2. Review of last week’s session (i.e. cleaning text) 3. Counting words 4. Plotting results 5. Making word clouds 6. Wrapping up CC Images: https://thenounproject.com/term/graph/21394; https://thenounproject.com/term/word-cloud/195993
  • 25. First, set your working directory: Session > Set Working Directory > Choose Directory > [folder] install.packages(“tm”) library(tm) getwd() texts <- Corpus(DirSource(“[path to working directory]”) writeLines(as.character(texts[[4]]) ?tm_map getTransformations() texts1 <- tm_map(texts, removePunctuation) texts2 <- tm_map(texts1, removeNumbers) texts3 <- tm_map(texts2, content_transformer(tolower)) texts4 <- tm_map(texts3, removeWords, stopwords(“english”)) texts_final <- tm_map(texts4, stripWhitespace) writeLines(as.character(texts_final[[4]]) dtm <- DocumentTermMatrix(texts_final)
  • 26. Getting word frequencies and associations: freq <- colSums(as.matrix(dtm)) freq[1:10] freq_d <- sort(freq, decreasing=TRUE) freq_d[1:10] findFreqTerms(dtm, lowfreq=100) findAssocs(dtm, "women", 0.95) ?findAssocs
  • 27. Making a bar chart (and then making it look nice): barplot(freq_d[1:10]) ?barplot install.packages("RColorBrewer") library(RColorBrewer) ?RColorBrewer display.brewer.all|) cols <- brewer.pal(8, "Spectral") barplot(freq_d[1:10], col=cols, main="My Cool Plot", xlab="Word", ylab="Instances")
  • 28. Making a word cloud (and then making it look nice): install.packages("wordcloud") library(wordcloud) matrix <- as.matrix(dtm) words <- sort(colSums(matrix), decreasing=TRUE) df <- data.frame(word=names(words), freq=words) ?data.frame wordcloud(words=df$word, freq=df$freq, max.words=100, random.order=FALSE, col=cols) ?wordcloud
  • 29. Help me! (1/3) R Communities #rstats (Twitter): https://twitter.com/hashtag/rstats Forwards: https://forwards.github.io R-Bloggers: https://www.r-bloggers.com R-Ladies: https://rladies.org r/rstats: https://www.reddit.com/r/rstats RStudio Community: https://community.rstudio.com Stack Overflow: https://stackoverflow.com/questions/tagged/r
  • 30. Help me! (2/3) R Resources Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014) https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/ LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r- release/R-intro.pdf
  • 31. Help me! (3/3) R Packages for Text Analysis corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html stylo (stylometry): https://cran.r-project.org/web/packages/stylo syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
  • 32. If you’re interested in stylometry specifically… The Digital Humanities Summer Institute is offering its annual ‘Stylometry with R’ workshop FREE and ASYNCHRONOUSLY this year (14-18 June 2021)! Details and registration at https://dhsi.org/dhsi-2021-online-edition/dhsi-2021-online- edition-workshops.
  • 33. Thank you! Dr Leah Henrickson Lecturer in Digital Media School of Media and Communication University of Leeds L.R.Henrickson@leeds.ac.uk twitter.com/leahhenrickson