Are you interested in learning about text analysis but have little to no experience with programming languages or writing code? These two short courses will introduce you to multiple text analysis methods. We will examine real-world examples and engage in hands-on activities that don’t require running any code. These short courses are ideal for students and researchers in non-technical fields, faculty who would like to incorporate text analysis in their curriculum, or as a precursor to programming with text analysis tools.
A Gentle Introduction to Text Analysis I will cover both qualitative and quantitative text analysis methods, bag-of-words techniques and classification.
Content Analysis Overview for Persona DevelopmentPamela Rutledge
After developing an Ad Hoc persona as the core of your engagement strategy, it's important to test your assumptions against real people and real data. Content analysis is a methodology for evaluating text-based data that can be gathered from social media tools.
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopOCLC
Connaway, Lynn Silipigni, and Marie L. Radford. 2016. "Using Qualitative Methods for Library Evaluation: An Interactive Workshop." Presented at the Libraries in the Digital Age (LIDA) Conference, Zadar, Croatia, June 14.
Content Analysis Overview for Persona DevelopmentPamela Rutledge
After developing an Ad Hoc persona as the core of your engagement strategy, it's important to test your assumptions against real people and real data. Content analysis is a methodology for evaluating text-based data that can be gathered from social media tools.
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopOCLC
Connaway, Lynn Silipigni, and Marie L. Radford. 2016. "Using Qualitative Methods for Library Evaluation: An Interactive Workshop." Presented at the Libraries in the Digital Age (LIDA) Conference, Zadar, Croatia, June 14.
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopLynn Connaway
Connaway, Lynn Silipigni, and Marie L. Radford. 2016. "Using Qualitative Methods for Library Evaluation: An Interactive Workshop." Presented at the Libraries in the Digital Age (LIDA) Conference, Zadar, Croatia, June 14.
1Week 5Critiquing Research Articles to Prepare an Annotated B.docxfelicidaddinwoodie
1
Week 5:Critiquing Research Articles to Prepare an Annotated Bibliography
As mentioned, one component of becoming an independent scholar is learning how to provide an evaluative critique of the work of other scholars. A critique of scholarly work requires your ability to use high level critical thinking skills. In addition, you must be able to write constructively and communicate your ideas well with clear and focused writing.
The purpose of this assignment is two-fold. First, you are to demonstrate your ability to clearly and precisely summarize and critically evaluate specific information from peer-reviewed resources. Secondly, you are to demonstrate your ability to clearly present that evaluative information in writing that meets academic and professional expectations. These skills will be invaluable as you go on to develop your literature review and in your journey to become an independent scholar.
The result of this activity is produce annotated bibliographies based on the two peer-reviewed journal articles related to your chosen topic (you are welcome to include more articles for practice and feedback). Use the sections and questions below to help you critique each article. You do not need to answer every single question as some questions might not apply. The questions are listed as a means to help you generate ideas as you work on critiquing each article. You might also consider using this template in the future when critically analyzing articles.
Please REMOVE the instructions and questions listed below for your paper and submit an annotated bibliography for each article.
Link to peer reviewed article one:
http://intqhc.oxfordjournals.org/content/19/6/341
1) APA reference for article #1
2)Introductionand core study elements
· Give an overview of the purpose of the study and the problem or issue discussed.
· Consider whether the problem is clearly described. Did the author(s) document and support the existence of the problem with scholarly sources and data? Were the sources credible and relevant (as defined by the readings you’ve done for this course)?
· What were the research questions?
· What were the key findings and conclusions of the study?
3) Evaluate literature reviewed
· Examine the literature reviewed by the author(s). How relevant is the) cited literature? Do certain ideas or concepts appear to be over/underemphasized? Was there any bias in language or tone of the writing? What discussions need elaboration or could be more concise? What is missing?
4) Evaluate theoretical framework
· What theoretical or conceptual framework was used as the basis for the study? What are the key variables and from which theories do they originate? Are variables well-defined? What alternative theories might support this study?
5) Evaluate methods
· What research method and designs are used in this study?
· How well are the methods described (could a reader duplicate the research process if needed)?
· Do the m ...
Sentiment analysis involves the process of automatically detecting the polarity of a text and extracting the author's reviews on the subject, and finally, classifying the text. In many research approaches, the textual data classification is done using deep learning models. This is due to the ability of deep learning models to classify a text with a high accuracy and the ability to model the sequence of textual data with word dependencies throughout the sentence. One of these deep learning models is RNN (Recurrent Neural Network). In order to use these models, the textual data and words must be converted into numerical vectors, for which various algorithms and methods have been proposed [10]. Today's pretrained word embedding libraries such as FastText have a high accuracy and quality in vector representations for words. Accordingly, in most current systems and research approaches, these libraries are used to convert the textual data to numerical vectors
Peer review is often seen as a cornerstone of modern science. We are going to discuss the current peer review practices in software engineering research, their strengths and limitations. Next we will discuss tips and tricks for writing code reviews, as well as implications for writing papers. I will also share some insights in my own reviewing practices.
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence Marina Santini
Query logs are an important source of information to surmize users intents'. Although Karlgren (2010) points out that “There are several reasons to be cautious in drawing too far-reaching conclusions: we cannot say for sure what the users were after; [...]“, some linguistic problems could be sorted out by applying more advanced text/content analytics, such as register/sublanguage identification and terminology classification (see Friberg Heppin, 2011) . In this presentation, I will argue that query logs can be considered a digital textual genre alike emails, blogs, chats, tweets and so forth. All these genres contain unstructured information that, still today, is difficult to leverage upon satisfactorily. The hypothesis that I would like to put forward in this workshop is that query logs might be easier to exploit to extract useful information and actionable intelligence than other digital genres.
Are you interested in learning about text analysis but have little to no experience with programming languages or writing code? These two short courses will introduce you to multiple text analysis methods. We will examine real-world examples and engage in hands-on activities that don’t require running any code. These short courses are ideal for students and researchers in non-technical fields, faculty who would like to incorporate text analysis in their curriculum, or as a precursor to programming with text analysis tools.
We will continue to explore text analysis methods as introduced in A Gentle Introduction to Text Analysis I. Topics covered include Named Entity Recognition and Sentiment Analysis. There will also be a discussion on ethical problems in text analysis.
Identify the Best Sites for Your Research with ArchiveGridUNCResearchHub
Need help figuring out which archives hold the research material you need? This workshop will introduce researchers to ArchiveGrid, a database cataloging archive collections around the world. In addition, participants will learn other tips for formulating research questions and plan their research.
More Related Content
Similar to A Gentle Introduction to Text Analysis I
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopLynn Connaway
Connaway, Lynn Silipigni, and Marie L. Radford. 2016. "Using Qualitative Methods for Library Evaluation: An Interactive Workshop." Presented at the Libraries in the Digital Age (LIDA) Conference, Zadar, Croatia, June 14.
1Week 5Critiquing Research Articles to Prepare an Annotated B.docxfelicidaddinwoodie
1
Week 5:Critiquing Research Articles to Prepare an Annotated Bibliography
As mentioned, one component of becoming an independent scholar is learning how to provide an evaluative critique of the work of other scholars. A critique of scholarly work requires your ability to use high level critical thinking skills. In addition, you must be able to write constructively and communicate your ideas well with clear and focused writing.
The purpose of this assignment is two-fold. First, you are to demonstrate your ability to clearly and precisely summarize and critically evaluate specific information from peer-reviewed resources. Secondly, you are to demonstrate your ability to clearly present that evaluative information in writing that meets academic and professional expectations. These skills will be invaluable as you go on to develop your literature review and in your journey to become an independent scholar.
The result of this activity is produce annotated bibliographies based on the two peer-reviewed journal articles related to your chosen topic (you are welcome to include more articles for practice and feedback). Use the sections and questions below to help you critique each article. You do not need to answer every single question as some questions might not apply. The questions are listed as a means to help you generate ideas as you work on critiquing each article. You might also consider using this template in the future when critically analyzing articles.
Please REMOVE the instructions and questions listed below for your paper and submit an annotated bibliography for each article.
Link to peer reviewed article one:
http://intqhc.oxfordjournals.org/content/19/6/341
1) APA reference for article #1
2)Introductionand core study elements
· Give an overview of the purpose of the study and the problem or issue discussed.
· Consider whether the problem is clearly described. Did the author(s) document and support the existence of the problem with scholarly sources and data? Were the sources credible and relevant (as defined by the readings you’ve done for this course)?
· What were the research questions?
· What were the key findings and conclusions of the study?
3) Evaluate literature reviewed
· Examine the literature reviewed by the author(s). How relevant is the) cited literature? Do certain ideas or concepts appear to be over/underemphasized? Was there any bias in language or tone of the writing? What discussions need elaboration or could be more concise? What is missing?
4) Evaluate theoretical framework
· What theoretical or conceptual framework was used as the basis for the study? What are the key variables and from which theories do they originate? Are variables well-defined? What alternative theories might support this study?
5) Evaluate methods
· What research method and designs are used in this study?
· How well are the methods described (could a reader duplicate the research process if needed)?
· Do the m ...
Sentiment analysis involves the process of automatically detecting the polarity of a text and extracting the author's reviews on the subject, and finally, classifying the text. In many research approaches, the textual data classification is done using deep learning models. This is due to the ability of deep learning models to classify a text with a high accuracy and the ability to model the sequence of textual data with word dependencies throughout the sentence. One of these deep learning models is RNN (Recurrent Neural Network). In order to use these models, the textual data and words must be converted into numerical vectors, for which various algorithms and methods have been proposed [10]. Today's pretrained word embedding libraries such as FastText have a high accuracy and quality in vector representations for words. Accordingly, in most current systems and research approaches, these libraries are used to convert the textual data to numerical vectors
Peer review is often seen as a cornerstone of modern science. We are going to discuss the current peer review practices in software engineering research, their strengths and limitations. Next we will discuss tips and tricks for writing code reviews, as well as implications for writing papers. I will also share some insights in my own reviewing practices.
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence Marina Santini
Query logs are an important source of information to surmize users intents'. Although Karlgren (2010) points out that “There are several reasons to be cautious in drawing too far-reaching conclusions: we cannot say for sure what the users were after; [...]“, some linguistic problems could be sorted out by applying more advanced text/content analytics, such as register/sublanguage identification and terminology classification (see Friberg Heppin, 2011) . In this presentation, I will argue that query logs can be considered a digital textual genre alike emails, blogs, chats, tweets and so forth. All these genres contain unstructured information that, still today, is difficult to leverage upon satisfactorily. The hypothesis that I would like to put forward in this workshop is that query logs might be easier to exploit to extract useful information and actionable intelligence than other digital genres.
Are you interested in learning about text analysis but have little to no experience with programming languages or writing code? These two short courses will introduce you to multiple text analysis methods. We will examine real-world examples and engage in hands-on activities that don’t require running any code. These short courses are ideal for students and researchers in non-technical fields, faculty who would like to incorporate text analysis in their curriculum, or as a precursor to programming with text analysis tools.
We will continue to explore text analysis methods as introduced in A Gentle Introduction to Text Analysis I. Topics covered include Named Entity Recognition and Sentiment Analysis. There will also be a discussion on ethical problems in text analysis.
Identify the Best Sites for Your Research with ArchiveGridUNCResearchHub
Need help figuring out which archives hold the research material you need? This workshop will introduce researchers to ArchiveGrid, a database cataloging archive collections around the world. In addition, participants will learn other tips for formulating research questions and plan their research.
Making the Most of Your Time in an Archive: Archival Research Management, or...UNCResearchHub
Archival research can be exciting but also somewhat complex for a new researcher. This workshop draws on the expertise of long-time researchers to offer a roadmap for neophytes. Learn strategies for all stages of research: before you go; while you’re there; after the archive closes; and once you return home.
A discussion of the elements of a research question; general tips on the types of places likely to have data; and a range of things to consider when hunting data.
This slide deck is from a workshop that took place at the UNC Chapel Hill Davis Library Research Hub.
Collecting data is now easier than it has ever been. But, as data becomes more prolific, datasets become larger and more complex. How do we find meaningful patterns in our data? How can we communicate those patterns to others? Data visualization allows us to make sense of today’s ever evolving information landscape.
This workshop will introduce the history and basic principles of data visualization. Learn about best practices and resources for making an impact with your data through compelling charts, graphs and maps.
Turning Data into Infographics: An Interactive Workshop for Problem SolversUNCResearchHub
This workshop was given at the UNC Undergraduate Library on October 4, 2016. It steps through the process of finding data sources, exploring data, and ultimately creating a persuasive infographic using that data. A brief introduction to infographics and best practices are included.
Presentation by Dave Hansen of the UNC Law Library, March 27, 2015, about the scholarly communication issues around "orphan" works, those books or works of art or other copyrighted materials for which no author or rights-holding organization can be found.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
4. What is Text Analysis
• What challenges do pediatric health providers
experience in their work?
• How do people talk about vaccines on Twitter?
• Which laws in North Carolina are Jim Crow laws?
• How do posts on social media affect the impact
of research articles?
• What information about cultural objects can be
found in eBay listings?
A Tool for Answering Research Questions
5. What is Text Analysis
Text analysis is the process by which
meaningful information is extracted
from unstructured text data.
6. What is Text Analysis
• Used in Social Sciences and Humanities fields
• Technique for systematically identifying patterns in
different types of communication
• Difficult and time consuming to do manually without
the help of digital tools
• Manifest content – what is said
quantitative techniques work better
• Latent content – what is meant
qualitative techniques work better
Text Analysis is related to Content Analysis
9. Qualitative Text Analysis
• Gathered information on the experiences
and perceptions of pediatric speech
language pathologists through in-depth
interviews with professionals
• Interview transcripts were thematically
coded using qualitative text analysis
software
EXAMPLE
Speech-Language Pathologists in Pediatric
Palliative Care: An International Study of
Perceptions and Experiences
Krikheli, L., Erickson, S., Carey, L. B., Carey-Sargeant, C., & Mathisen, B. A.
(2021). Speech-language pathologists in pediatric palliative care: An
international study of perceptions and experiences. American Journal of
Speech - Language Pathology (Online), 30(1), 150-168. doi:https://doi-
org.libproxy.lib.unc.edu/10.1044/2020_AJSLP-20-00090
10. Qualitative Text Analysis
• Decide what is to be coded – entire
response? Individual sentences?
• Decide what codes to use –
deductive or inductive?
• Select text and apply codes
• Analyze and visualize results
The Coding Process
16. Bag of Words
• Examined vaccine-related tweets
from a period in 2019 when a measles
outbreak occurred in the U.S.
• Used bag of words methods that
focus on how often words occur in
the text, a.k.a. term frequency
EXAMPLE
Studying Public Perception about
Vaccination: A Sentiment Analysis of Tweets
Raghupathi, V., Ren, J., & Raghupathi, W. (2020). Studying public
perception about vaccination: A sentiment analysis of
tweets. International Journal of Environmental Research and Public
Health, 17(10), 3464. doi:https://doi.org/10.3390/ijerph17103464
17. Bag of Words
What does “Bag of Words” Mean?
Corpus
(e.g., collection of tweets)
Nearly ALL innate structure is removed from the text including punctuation, white space and
paragraphs, the order in which words occur, etc., so we end up with an unstructured “bag” full of
jumbled-up words.
Document
(e.g. single tweet)
Bag of Words
18. Bag of Words
• Refers to transformations that
must be performed on the text
before analysis can occur
Preprocessing
• Tokenization
• Removal of stop words
• Stemming/Lemmatization
Common Preprocessing Steps
19. Bag of Words
• Breaks a document or corpus
into separate clusters of
text that become the unit of
analysis
Tokenization
• Word
• N-gram
Types of Tokenization
bigrams trigrams
This is a sentence. this is a sentence
20. Bag of Words
• List of words removed from
the corpus
• Include extremely common
words that aren’t usually very
informative
• Articles, pronouns, “to be”
verbs, etc.
Stop Words with stop words without stop words
21. Bag of Words
• Algorithm removes certain word
endings so similar words can be
counted together (cars -> car)
• Can result in inaccurate word
frequencies (caring -> car)
Stemming
• Takes linguistic morphology into
account, providing better accuracy
• Algorithm relies on detailed
dictionaries
Lemmatization
22. Bag of Words
• Often needs to be normalized by the
total number of words in a document
• Number of times a term appears
divided by the total number of terms
Term Frequency Method
• Uncommon terms are more
informative than common ones
• TF-IDF takes into account both the
frequency of a term in a
document and its overall scarcity
in the corpus
TF-IDF Method
23. Bag of Words
1. Go to voyant-tools.org
2. Click on
3. Choose
Tips
• What is the corpus? What are the documents?
How many documents are in the corpus? (Hint:
Summary tool)
• What is the most common 4-gram in the corpus?
In what document does it have the highest relative
frequency? (Hint: Phrases tool and Trends tool)
• Edit the stop words so that the top 20 terms are
more informative. (Hint: Terms tool - Options)
• Find a term that Shakespeare uses more often in
his early works than his later works. (Hint: Terms
tool and Trends tool)
• Look at the highest Significance scores for
Document Terms. Why are they all character
names? (Hint: Document Terms tool - Help)
EXERCISE
Shakespeare’s works in Voyant
Open tool in a new tab (click Export)
Change tool
Change options (more at bottom of tool)
Tool help
25. Classification
• Discovered Jim Crow and racially-based
legislation signed into law in North
Carolina between 1866 and 1967
• Used supervised and unsupervised
classification techniques to identify laws
with race-based language and the type of
legislation covered by those laws
EXAMPLE
On the Books: Jim Crow and Algorithms
of Resistance
Henley, A., Jansen, M., Bruckner, L., Byers, N., & Dalwadi, R. (2020). On the
Books: Jim Crow and Algorithms of Resistance White Paper.
https://doi.org/10.17615/hvz4-sr14
26. Classification
• Separate documents into different
groups depending on the language
used in the text
• Often called topic modeling when the
groups represent topics
Uses for Classification
• Supervised
• Unsupervised
• Rule based
Types of Classification
? ? ? ?
? ? ? ?
A B C D
A B C D
27. Classification
• Portion of the corpus must be
manually labeled to provide a training
set (very time consuming)
• Training set serves as an example of
correctly classified text (it’s the
supervisor)
• Machine learning algorithm uses the
labels supplied by the training set to
predict classifications for documents
in a test set, and eventually unlabeled
documents
• Often, many different algorithms are
trained, tested and compared to find
the best results, e.g., Naïve Bayes,
Random Forest, Support Vector
Machine, XGBoost
Supervised
ML Algorithm
Labeled Docs
Test Set
? ? ? ?
? ? ? ?
Training Set
A B C D
A B C D
A B C D
Predictions
A B C D
A B C D
A
A
A
A
A
B
B
B
C
C
C
C D
D
A
A
B
B C
28. Classification
• Does not require manually labeled text
and is less time consuming
• Due to lack of labeling, results are often
less accurate and harder to evaluate
• Researcher determines number of
groups the algorithm will organize the
data into
• Machine learning algorithm looks for
similarities in the data and creates
groups based on those similarities
• Many different types of algorithms
including Latent Dirichlet allocation, K-
Means clustering, word embeddings, etc.
Unsupervised Unlabeled Docs Number of Groups
A B C D
ML Algorithm
Predictions
A B C D
A B C D
29. Classification
1. Follow the steps at go.unc.edu/topmod.
2. Go to your Desktop and open the
output_html folder.
3. Double-click on all_topics.html
Tips
• If you had to come up with a name for
each topic, what would it be?
• Click on each topic and look at the top
five laws within it. Based on the text of
those laws, how representative are the
topic names you picked?
• How useful do you think the results of the
topic modeling algorithm are?
EXERCISE
Topic Modeling Tool
To learn more about the topic modeling
algorithm we are using, visit:
go.unc.edu/lda
(written by Google employee, Ria Kulshrestha)