SlideShare a Scribd company logo
Before the Workshop:
go.unc.edu/topmod
A GENTLE INTRODUCTION TO
Text Analysis I
Lorin Bruckner
What is text analysis
and why do we use it?
What is Text Analysis
• What challenges do pediatric health providers
experience in their work?
• How do people talk about vaccines on Twitter?
• Which laws in North Carolina are Jim Crow laws?
• How do posts on social media affect the impact
of research articles?
• What information about cultural objects can be
found in eBay listings?
A Tool for Answering Research Questions
What is Text Analysis
Text analysis is the process by which
meaningful information is extracted
from unstructured text data.
What is Text Analysis
• Used in Social Sciences and Humanities fields
• Technique for systematically identifying patterns in
different types of communication
• Difficult and time consuming to do manually without
the help of digital tools
• Manifest content – what is said
quantitative techniques work better
• Latent content – what is meant
qualitative techniques work better
Text Analysis is related to Content Analysis
Qualitative Text Analysis
Qualitative Text Analysis
Qualitative Text Analysis Technique: Thematic Coding
Qualitative Text Analysis
• Gathered information on the experiences
and perceptions of pediatric speech
language pathologists through in-depth
interviews with professionals
• Interview transcripts were thematically
coded using qualitative text analysis
software
EXAMPLE
Speech-Language Pathologists in Pediatric
Palliative Care: An International Study of
Perceptions and Experiences
Krikheli, L., Erickson, S., Carey, L. B., Carey-Sargeant, C., & Mathisen, B. A.
(2021). Speech-language pathologists in pediatric palliative care: An
international study of perceptions and experiences. American Journal of
Speech - Language Pathology (Online), 30(1), 150-168. doi:https://doi-
org.libproxy.lib.unc.edu/10.1044/2020_AJSLP-20-00090
Qualitative Text Analysis
• Decide what is to be coded – entire
response? Individual sentences?
• Decide what codes to use –
deductive or inductive?
• Select text and apply codes
• Analyze and visualize results
The Coding Process
Qualitative Text Analysis
Visualization Examples
Qualitative Text Analysis
• NVivo
• ATLAS.ti
• MAXQDA
• Dedoose
Tools
odum.unc.edu/qualitative-research
• Word
• Excel
Quantitative Text Analysis
Quantitative Text Analysis
• Bag of Words
• Classification
• Named Entity Recognition
• Sentiment Analysis
Quantitative Text Analysis Techniques
Bag of Words
Bag of Words
• Examined vaccine-related tweets
from a period in 2019 when a measles
outbreak occurred in the U.S.
• Used bag of words methods that
focus on how often words occur in
the text, a.k.a. term frequency
EXAMPLE
Studying Public Perception about
Vaccination: A Sentiment Analysis of Tweets
Raghupathi, V., Ren, J., & Raghupathi, W. (2020). Studying public
perception about vaccination: A sentiment analysis of
tweets. International Journal of Environmental Research and Public
Health, 17(10), 3464. doi:https://doi.org/10.3390/ijerph17103464
Bag of Words
What does “Bag of Words” Mean?
Corpus
(e.g., collection of tweets)
Nearly ALL innate structure is removed from the text including punctuation, white space and
paragraphs, the order in which words occur, etc., so we end up with an unstructured “bag” full of
jumbled-up words.
Document
(e.g. single tweet)
Bag of Words
Bag of Words
• Refers to transformations that
must be performed on the text
before analysis can occur
Preprocessing
• Tokenization
• Removal of stop words
• Stemming/Lemmatization
Common Preprocessing Steps
Bag of Words
• Breaks a document or corpus
into separate clusters of
text that become the unit of
analysis
Tokenization
• Word
• N-gram
Types of Tokenization
bigrams trigrams
This is a sentence. this is a sentence
Bag of Words
• List of words removed from
the corpus
• Include extremely common
words that aren’t usually very
informative
• Articles, pronouns, “to be”
verbs, etc.
Stop Words with stop words without stop words
Bag of Words
• Algorithm removes certain word
endings so similar words can be
counted together (cars -> car)
• Can result in inaccurate word
frequencies (caring -> car)
Stemming
• Takes linguistic morphology into
account, providing better accuracy
• Algorithm relies on detailed
dictionaries
Lemmatization
Bag of Words
• Often needs to be normalized by the
total number of words in a document
• Number of times a term appears
divided by the total number of terms
Term Frequency Method
• Uncommon terms are more
informative than common ones
• TF-IDF takes into account both the
frequency of a term in a
document and its overall scarcity
in the corpus
TF-IDF Method
Bag of Words
1. Go to voyant-tools.org
2. Click on
3. Choose
Tips
• What is the corpus? What are the documents?
How many documents are in the corpus? (Hint:
Summary tool)
• What is the most common 4-gram in the corpus?
In what document does it have the highest relative
frequency? (Hint: Phrases tool and Trends tool)
• Edit the stop words so that the top 20 terms are
more informative. (Hint: Terms tool - Options)
• Find a term that Shakespeare uses more often in
his early works than his later works. (Hint: Terms
tool and Trends tool)
• Look at the highest Significance scores for
Document Terms. Why are they all character
names? (Hint: Document Terms tool - Help)
EXERCISE
Shakespeare’s works in Voyant
Open tool in a new tab (click Export)
Change tool
Change options (more at bottom of tool)
Tool help
Classification
Classification
• Discovered Jim Crow and racially-based
legislation signed into law in North
Carolina between 1866 and 1967
• Used supervised and unsupervised
classification techniques to identify laws
with race-based language and the type of
legislation covered by those laws
EXAMPLE
On the Books: Jim Crow and Algorithms
of Resistance
Henley, A., Jansen, M., Bruckner, L., Byers, N., & Dalwadi, R. (2020). On the
Books: Jim Crow and Algorithms of Resistance White Paper.
https://doi.org/10.17615/hvz4-sr14
Classification
• Separate documents into different
groups depending on the language
used in the text
• Often called topic modeling when the
groups represent topics
Uses for Classification
• Supervised
• Unsupervised
• Rule based
Types of Classification
? ? ? ?
? ? ? ?
A B C D
A B C D
Classification
• Portion of the corpus must be
manually labeled to provide a training
set (very time consuming)
• Training set serves as an example of
correctly classified text (it’s the
supervisor)
• Machine learning algorithm uses the
labels supplied by the training set to
predict classifications for documents
in a test set, and eventually unlabeled
documents
• Often, many different algorithms are
trained, tested and compared to find
the best results, e.g., Naïve Bayes,
Random Forest, Support Vector
Machine, XGBoost
Supervised
ML Algorithm
Labeled Docs
Test Set
? ? ? ?
? ? ? ?
Training Set
A B C D
A B C D
A B C D
Predictions
A B C D
A B C D
A
A
A
A
A
B
B
B
C
C
C
C D
D
A
A
B
B C
Classification
• Does not require manually labeled text
and is less time consuming
• Due to lack of labeling, results are often
less accurate and harder to evaluate
• Researcher determines number of
groups the algorithm will organize the
data into
• Machine learning algorithm looks for
similarities in the data and creates
groups based on those similarities
• Many different types of algorithms
including Latent Dirichlet allocation, K-
Means clustering, word embeddings, etc.
Unsupervised Unlabeled Docs Number of Groups
A B C D
ML Algorithm
Predictions
A B C D
A B C D
Classification
1. Follow the steps at go.unc.edu/topmod.
2. Go to your Desktop and open the
output_html folder.
3. Double-click on all_topics.html
Tips
• If you had to come up with a name for
each topic, what would it be?
• Click on each topic and look at the top
five laws within it. Based on the text of
those laws, how representative are the
topic names you picked?
• How useful do you think the results of the
topic modeling algorithm are?
EXERCISE
Topic Modeling Tool
To learn more about the topic modeling
algorithm we are using, visit:
go.unc.edu/lda
(written by Google employee, Ria Kulshrestha)

More Related Content

Similar to A Gentle Introduction to Text Analysis I

Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Lynn Connaway
 
1Week 5Critiquing Research Articles to Prepare an Annotated B.docx
1Week 5Critiquing Research Articles to Prepare an Annotated B.docx1Week 5Critiquing Research Articles to Prepare an Annotated B.docx
1Week 5Critiquing Research Articles to Prepare an Annotated B.docx
felicidaddinwoodie
 
Content analysis
Content analysisContent analysis
Content analysis
Atul Thakur
 
Content analysis
Content analysisContent analysis
Content analysis
Atul Thakur
 
Polisciguide2017
Polisciguide2017Polisciguide2017
Polisciguide2017
Annelise Sklar
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
hajinouha0
 
Qualitative research methods kanhaiya sapkota
Qualitative research methods kanhaiya sapkotaQualitative research methods kanhaiya sapkota
Qualitative research methods kanhaiya sapkota
Tribhuvan University, Kathmandu, Nepal
 
Peer reviews
Peer reviewsPeer reviews
Peer reviews
Alexander Serebrenik
 
DSWBL Week 1: Using evidence in writing
DSWBL Week 1: Using evidence in writingDSWBL Week 1: Using evidence in writing
DSWBL Week 1: Using evidence in writing
ADT U2B
 
Pemanfaatan TIK.pdf
Pemanfaatan TIK.pdfPemanfaatan TIK.pdf
Pemanfaatan TIK.pdf
AGUSAANADRIANSYAH
 
Perceptual Data_04182016
Perceptual Data_04182016Perceptual Data_04182016
Perceptual Data_04182016Kunal Dash
 
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
Marina Santini
 
EDS for JIBS
EDS for JIBSEDS for JIBS
EDS for JIBS
CliveRWright
 
Week 1: Using evidence in writing
Week 1: Using evidence in writingWeek 1: Using evidence in writing
Week 1: Using evidence in writing
ADT U2B
 
Content Analysis vs secondary analysis
Content Analysis vs secondary analysisContent Analysis vs secondary analysis
Content Analysis vs secondary analysis
Dr. Cupid Lucid
 
Analysing Qualitative Data
Analysing Qualitative DataAnalysing Qualitative Data
Analysing Qualitative Data
Mike Crabb
 
Content analysis
Content analysis Content analysis
Content analysis
ayesha shah
 
Thematic content analysis in psychology
Thematic content analysis in psychologyThematic content analysis in psychology
Thematic content analysis in psychology
Dr. Chinchu C
 
Analyzing+Methods+sections.pptx
Analyzing+Methods+sections.pptxAnalyzing+Methods+sections.pptx
Analyzing+Methods+sections.pptx
NatalieCamposMartine
 

Similar to A Gentle Introduction to Text Analysis I (20)

Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
 
1Week 5Critiquing Research Articles to Prepare an Annotated B.docx
1Week 5Critiquing Research Articles to Prepare an Annotated B.docx1Week 5Critiquing Research Articles to Prepare an Annotated B.docx
1Week 5Critiquing Research Articles to Prepare an Annotated B.docx
 
Content analysis
Content analysisContent analysis
Content analysis
 
Content analysis
Content analysisContent analysis
Content analysis
 
Polisciguide2017
Polisciguide2017Polisciguide2017
Polisciguide2017
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
 
Qualitative research methods kanhaiya sapkota
Qualitative research methods kanhaiya sapkotaQualitative research methods kanhaiya sapkota
Qualitative research methods kanhaiya sapkota
 
Peer reviews
Peer reviewsPeer reviews
Peer reviews
 
DSWBL Week 1: Using evidence in writing
DSWBL Week 1: Using evidence in writingDSWBL Week 1: Using evidence in writing
DSWBL Week 1: Using evidence in writing
 
Week4 bus 305
Week4 bus 305Week4 bus 305
Week4 bus 305
 
Pemanfaatan TIK.pdf
Pemanfaatan TIK.pdfPemanfaatan TIK.pdf
Pemanfaatan TIK.pdf
 
Perceptual Data_04182016
Perceptual Data_04182016Perceptual Data_04182016
Perceptual Data_04182016
 
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
 
EDS for JIBS
EDS for JIBSEDS for JIBS
EDS for JIBS
 
Week 1: Using evidence in writing
Week 1: Using evidence in writingWeek 1: Using evidence in writing
Week 1: Using evidence in writing
 
Content Analysis vs secondary analysis
Content Analysis vs secondary analysisContent Analysis vs secondary analysis
Content Analysis vs secondary analysis
 
Analysing Qualitative Data
Analysing Qualitative DataAnalysing Qualitative Data
Analysing Qualitative Data
 
Content analysis
Content analysis Content analysis
Content analysis
 
Thematic content analysis in psychology
Thematic content analysis in psychologyThematic content analysis in psychology
Thematic content analysis in psychology
 
Analyzing+Methods+sections.pptx
Analyzing+Methods+sections.pptxAnalyzing+Methods+sections.pptx
Analyzing+Methods+sections.pptx
 

More from UNCResearchHub

A Gentle Introduction to Text Analysis II
A Gentle Introduction to Text Analysis IIA Gentle Introduction to Text Analysis II
A Gentle Introduction to Text Analysis II
UNCResearchHub
 
Identify the Best Sites for Your Research with ArchiveGrid
Identify the Best Sites for Your Research with ArchiveGridIdentify the Best Sites for Your Research with ArchiveGrid
Identify the Best Sites for Your Research with ArchiveGrid
UNCResearchHub
 
Making the Most of Your Time in an Archive: Archival Research Management, or...
Making the Most of Your Time in an Archive:  Archival Research Management, or...Making the Most of Your Time in an Archive:  Archival Research Management, or...
Making the Most of Your Time in an Archive: Archival Research Management, or...
UNCResearchHub
 
Manage Research Images with Tropy
Manage Research Images with TropyManage Research Images with Tropy
Manage Research Images with Tropy
UNCResearchHub
 
Finding data
Finding dataFinding data
Finding data
UNCResearchHub
 
Census concepts
Census conceptsCensus concepts
Census concepts
UNCResearchHub
 
Data Visualization Tools & Resources
Data Visualization Tools & ResourcesData Visualization Tools & Resources
Data Visualization Tools & Resources
UNCResearchHub
 
Working With Infographics
Working With InfographicsWorking With Infographics
Working With Infographics
UNCResearchHub
 
Making an Impact With Data Visualization
Making an Impact With Data VisualizationMaking an Impact With Data Visualization
Making an Impact With Data Visualization
UNCResearchHub
 
You Should Really Have an Online Portfolio
You Should Really Have an Online PortfolioYou Should Really Have an Online Portfolio
You Should Really Have an Online Portfolio
UNCResearchHub
 
Interactive Visualization
Interactive VisualizationInteractive Visualization
Interactive Visualization
UNCResearchHub
 
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem SolversTurning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
UNCResearchHub
 
Orphan Works Best Practices
Orphan Works Best PracticesOrphan Works Best Practices
Orphan Works Best Practices
UNCResearchHub
 

More from UNCResearchHub (13)

A Gentle Introduction to Text Analysis II
A Gentle Introduction to Text Analysis IIA Gentle Introduction to Text Analysis II
A Gentle Introduction to Text Analysis II
 
Identify the Best Sites for Your Research with ArchiveGrid
Identify the Best Sites for Your Research with ArchiveGridIdentify the Best Sites for Your Research with ArchiveGrid
Identify the Best Sites for Your Research with ArchiveGrid
 
Making the Most of Your Time in an Archive: Archival Research Management, or...
Making the Most of Your Time in an Archive:  Archival Research Management, or...Making the Most of Your Time in an Archive:  Archival Research Management, or...
Making the Most of Your Time in an Archive: Archival Research Management, or...
 
Manage Research Images with Tropy
Manage Research Images with TropyManage Research Images with Tropy
Manage Research Images with Tropy
 
Finding data
Finding dataFinding data
Finding data
 
Census concepts
Census conceptsCensus concepts
Census concepts
 
Data Visualization Tools & Resources
Data Visualization Tools & ResourcesData Visualization Tools & Resources
Data Visualization Tools & Resources
 
Working With Infographics
Working With InfographicsWorking With Infographics
Working With Infographics
 
Making an Impact With Data Visualization
Making an Impact With Data VisualizationMaking an Impact With Data Visualization
Making an Impact With Data Visualization
 
You Should Really Have an Online Portfolio
You Should Really Have an Online PortfolioYou Should Really Have an Online Portfolio
You Should Really Have an Online Portfolio
 
Interactive Visualization
Interactive VisualizationInteractive Visualization
Interactive Visualization
 
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem SolversTurning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
 
Orphan Works Best Practices
Orphan Works Best PracticesOrphan Works Best Practices
Orphan Works Best Practices
 

Recently uploaded

Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 

Recently uploaded (20)

Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 

A Gentle Introduction to Text Analysis I

  • 2. A GENTLE INTRODUCTION TO Text Analysis I Lorin Bruckner
  • 3. What is text analysis and why do we use it?
  • 4. What is Text Analysis • What challenges do pediatric health providers experience in their work? • How do people talk about vaccines on Twitter? • Which laws in North Carolina are Jim Crow laws? • How do posts on social media affect the impact of research articles? • What information about cultural objects can be found in eBay listings? A Tool for Answering Research Questions
  • 5. What is Text Analysis Text analysis is the process by which meaningful information is extracted from unstructured text data.
  • 6. What is Text Analysis • Used in Social Sciences and Humanities fields • Technique for systematically identifying patterns in different types of communication • Difficult and time consuming to do manually without the help of digital tools • Manifest content – what is said quantitative techniques work better • Latent content – what is meant qualitative techniques work better Text Analysis is related to Content Analysis
  • 8. Qualitative Text Analysis Qualitative Text Analysis Technique: Thematic Coding
  • 9. Qualitative Text Analysis • Gathered information on the experiences and perceptions of pediatric speech language pathologists through in-depth interviews with professionals • Interview transcripts were thematically coded using qualitative text analysis software EXAMPLE Speech-Language Pathologists in Pediatric Palliative Care: An International Study of Perceptions and Experiences Krikheli, L., Erickson, S., Carey, L. B., Carey-Sargeant, C., & Mathisen, B. A. (2021). Speech-language pathologists in pediatric palliative care: An international study of perceptions and experiences. American Journal of Speech - Language Pathology (Online), 30(1), 150-168. doi:https://doi- org.libproxy.lib.unc.edu/10.1044/2020_AJSLP-20-00090
  • 10. Qualitative Text Analysis • Decide what is to be coded – entire response? Individual sentences? • Decide what codes to use – deductive or inductive? • Select text and apply codes • Analyze and visualize results The Coding Process
  • 12. Qualitative Text Analysis • NVivo • ATLAS.ti • MAXQDA • Dedoose Tools odum.unc.edu/qualitative-research • Word • Excel
  • 14. Quantitative Text Analysis • Bag of Words • Classification • Named Entity Recognition • Sentiment Analysis Quantitative Text Analysis Techniques
  • 16. Bag of Words • Examined vaccine-related tweets from a period in 2019 when a measles outbreak occurred in the U.S. • Used bag of words methods that focus on how often words occur in the text, a.k.a. term frequency EXAMPLE Studying Public Perception about Vaccination: A Sentiment Analysis of Tweets Raghupathi, V., Ren, J., & Raghupathi, W. (2020). Studying public perception about vaccination: A sentiment analysis of tweets. International Journal of Environmental Research and Public Health, 17(10), 3464. doi:https://doi.org/10.3390/ijerph17103464
  • 17. Bag of Words What does “Bag of Words” Mean? Corpus (e.g., collection of tweets) Nearly ALL innate structure is removed from the text including punctuation, white space and paragraphs, the order in which words occur, etc., so we end up with an unstructured “bag” full of jumbled-up words. Document (e.g. single tweet) Bag of Words
  • 18. Bag of Words • Refers to transformations that must be performed on the text before analysis can occur Preprocessing • Tokenization • Removal of stop words • Stemming/Lemmatization Common Preprocessing Steps
  • 19. Bag of Words • Breaks a document or corpus into separate clusters of text that become the unit of analysis Tokenization • Word • N-gram Types of Tokenization bigrams trigrams This is a sentence. this is a sentence
  • 20. Bag of Words • List of words removed from the corpus • Include extremely common words that aren’t usually very informative • Articles, pronouns, “to be” verbs, etc. Stop Words with stop words without stop words
  • 21. Bag of Words • Algorithm removes certain word endings so similar words can be counted together (cars -> car) • Can result in inaccurate word frequencies (caring -> car) Stemming • Takes linguistic morphology into account, providing better accuracy • Algorithm relies on detailed dictionaries Lemmatization
  • 22. Bag of Words • Often needs to be normalized by the total number of words in a document • Number of times a term appears divided by the total number of terms Term Frequency Method • Uncommon terms are more informative than common ones • TF-IDF takes into account both the frequency of a term in a document and its overall scarcity in the corpus TF-IDF Method
  • 23. Bag of Words 1. Go to voyant-tools.org 2. Click on 3. Choose Tips • What is the corpus? What are the documents? How many documents are in the corpus? (Hint: Summary tool) • What is the most common 4-gram in the corpus? In what document does it have the highest relative frequency? (Hint: Phrases tool and Trends tool) • Edit the stop words so that the top 20 terms are more informative. (Hint: Terms tool - Options) • Find a term that Shakespeare uses more often in his early works than his later works. (Hint: Terms tool and Trends tool) • Look at the highest Significance scores for Document Terms. Why are they all character names? (Hint: Document Terms tool - Help) EXERCISE Shakespeare’s works in Voyant Open tool in a new tab (click Export) Change tool Change options (more at bottom of tool) Tool help
  • 25. Classification • Discovered Jim Crow and racially-based legislation signed into law in North Carolina between 1866 and 1967 • Used supervised and unsupervised classification techniques to identify laws with race-based language and the type of legislation covered by those laws EXAMPLE On the Books: Jim Crow and Algorithms of Resistance Henley, A., Jansen, M., Bruckner, L., Byers, N., & Dalwadi, R. (2020). On the Books: Jim Crow and Algorithms of Resistance White Paper. https://doi.org/10.17615/hvz4-sr14
  • 26. Classification • Separate documents into different groups depending on the language used in the text • Often called topic modeling when the groups represent topics Uses for Classification • Supervised • Unsupervised • Rule based Types of Classification ? ? ? ? ? ? ? ? A B C D A B C D
  • 27. Classification • Portion of the corpus must be manually labeled to provide a training set (very time consuming) • Training set serves as an example of correctly classified text (it’s the supervisor) • Machine learning algorithm uses the labels supplied by the training set to predict classifications for documents in a test set, and eventually unlabeled documents • Often, many different algorithms are trained, tested and compared to find the best results, e.g., Naïve Bayes, Random Forest, Support Vector Machine, XGBoost Supervised ML Algorithm Labeled Docs Test Set ? ? ? ? ? ? ? ? Training Set A B C D A B C D A B C D Predictions A B C D A B C D A A A A A B B B C C C C D D A A B B C
  • 28. Classification • Does not require manually labeled text and is less time consuming • Due to lack of labeling, results are often less accurate and harder to evaluate • Researcher determines number of groups the algorithm will organize the data into • Machine learning algorithm looks for similarities in the data and creates groups based on those similarities • Many different types of algorithms including Latent Dirichlet allocation, K- Means clustering, word embeddings, etc. Unsupervised Unlabeled Docs Number of Groups A B C D ML Algorithm Predictions A B C D A B C D
  • 29. Classification 1. Follow the steps at go.unc.edu/topmod. 2. Go to your Desktop and open the output_html folder. 3. Double-click on all_topics.html Tips • If you had to come up with a name for each topic, what would it be? • Click on each topic and look at the top five laws within it. Based on the text of those laws, how representative are the topic names you picked? • How useful do you think the results of the topic modeling algorithm are? EXERCISE Topic Modeling Tool To learn more about the topic modeling algorithm we are using, visit: go.unc.edu/lda (written by Google employee, Ria Kulshrestha)