Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labeling

Putting News in a Perspective: Framing by Word Choice
and Labeling
Anastasia Zhukova
University of Konstanz
Universität str., 10
Konstanz, Germany
anastasia.zhukova@uni-konstanz.de
ABSTRACT
While following the news, one can notice the same story can
have different impact depending on which news agent tells
it. One reason for this is how the facts are framed. Fram-
ing is described by communication sciences as an instrument
influencing on how people perceive, interpret and convey in-
formation. It can be obtained by use of specific word choice
and labeling that describe event or problem from a partic-
ular perspective, e.g. positive or negative. In order to de-
rive a frame, social sciences usually perform a manual qual-
itative analysis, but recently a computer-assist quantitative
approaches commence to be an essential way of conducting
framing analysis. This work provides a literature review on
the existing frame derivation methods based on problem of
word choice and labeling.
Keywords
Frame analysis, news analysis, word choice, labeling
1. INTRODUCTION
Consider reading news headlines reporting on the same
event from July, 2015 incident, when Palestinians stoned
a car with several police officers. The following two leave a
contradictory impression: Irish Times: ”Palestinian protester
shot dead in West Bank” and Reuters ”Israeli officer kills
stone-throwing Palestinian youth in West Bank”[1]. Reuters’
headline provides subject, action, object, and reason of the
performed action, Irish Times applies a label ”protester” to
a youth and switched the focus of the story towards the ob-
ject, resorting to passive voice and omitting the subject of
the story. Word choice of ”shoot dead” compared to ”kill”
may also refer to visual associations and change the percep-
tion of the story. The overall impression can lead a reader
to interpretation of the Irish Times in more negative way
compared to Reuters. Different news framing is a reason.
Framing is a conceptualization of the way how people or-
ganize, perceive, and communicate information. It is an
instrument of political science that communicates ideas and
messages, and ”defines issues” [46]. The term framing was
first introduced by Tuchman [43]. A frame is a framing en-
tity and is defined as a strong communicative tool, that in a
compact way unites a set of ideas that need to be transferred
to people, including problems, judgments, actions, causes
and solutions. In [16] Entman suggested a description of
framing ”selecting of some aspects of a perceived reality and
making them more salient in a communicating text, in such
a way as to promote a particular problem definition, casual
interpretation, moral evaluation, and/or treatment recom-
mendation”.
Usually a frame is a pattern of the most important part of
messages that members of political parties, newspapers, or
individuals convey, having an underlined interest. A frame
should be a rather short and simple message, that is a re-
flection a particular event or people [17], and is memorable
and reproducible, which can cause a further spread, use and
evolution.
Politicians use frames to motivate people to act within
a frame’s boundaries and its idea [10], affecting attitudes
and behaviors. Frames convey messages with a symbolic el-
ements, metaphors, utilizing stereotypes, word choice, and
labels that all together consolidate idea in an entity that
[46] calls a package. This package embeds cultural and indi-
vidual perception, that reflects in person’s mind the most.
Specifically chosen words have a great impact on people by
referring to the well known associations, images from the
previous experience and lexical tones.
The definition of a frame varies in its interpretation, and
consequently, social scientist define and apply several meth-
ods for frame derivation. Regardless of specifics, all methods
are either inductive or deductive. In inductive analysis a set
of texts is analyzed in order to define frame as a piece of
message that these texts or news convey[46]. The task is to
find important or according to Entman ”salient”information
either by frequency of influential words, or by words’ impact
on the one’s text perception. Deductive analysis works with
predefined frames and existing code-books which describe a
frame, and determines evidence of a frame being found in a
text.
The main challenge for computer-assisted frame analysis
is finding a frame and its elements, as frame analysis remains
mostly qualitative. Therefore, the research questions of the
paper are (1) how do scholars approach computer-assisted
framing analysis, (2) what are the methods that focus anal-
ysis on constructing or finding frames based on word choice
and labeling.
The paper is organized as follows: we start with giving an
overview of forms of frame analysis and its the general prop-
erties, then review the existing approaches and methods for
finding a frame, and discuss current and possible solutions
for framing w.r.t. word choice and labeling problem.
2. FRAME ANALYSIS
Framing is a process of issue conceptualization and uses
a frame as a tool. A frame a system of organized ideas and
messages, which are called attributes or devices [33]. Each

device consists of highly influential words, forming a struc-
ture acts as a trigger on people, aiming at a specific reaction
based on a particular background: cultural, symbolic and
psychological. [16] discussed information as influential and
important, or as he defined it — ”salient” — first, based on
word frequency. If one wants to emphasize a piece of infor-
mation, it needs to be repeated. Second, the use of familiar
symbols emphasizes perception of the information and trig-
gers fast interpretation of well-known concepts. All these
methods activate a frame and make it prominent [46].
Framing attributes can be associated with a lens, provid-
ing specific boundaries and perspective on a set of views
[13].[33] suggested 4 categories of structures for framing de-
vices:
1. syntactical structure — refers to inverted pyramid in
news discourse and its structural elements, i.e. head-
line, lead, main body, where words and phrases have
influence in the inverted order to the location in the
text — technical devices;
2. script structure — represents a sequence of activities
and components that a single event can consist of, i.e.
5 W and 1 H questions: who, what, when, where, why,
and how — framing devices;
3. thematic structure — defines an article as a theme or
being a subtheme — framing device;
4. rhetorical structure — consists of word and stylistic
choices, metaphors, exemplars, catchphrases, depic-
tions, labels, citations, and visual images, which all
together increase salience of a given point of view and
substitute facts for their interpretation — rhetorical
devices.
The word choice aims at a predefined reaction and inter-
pretation way, thus, influencing the decision making process
[25]. Specific word choice changes the polarity or valence of
the whole story, therefore, framing can introduce bias when
operating with the subjective words and tones [38]. It can be
done addressing people with specific terms and words that
correspond to their background and previous experience.
One concept can be represented by different words, and the
selection of a particular word combination depends on the
aim of a frame. Consider the examples: ”Heart-wrenching
tales of hardship faced by people whose care is dependent
on Medicaid” is very biased towards pity compared to ”In-
formation on the lifestyles of Medicaid dependents”[2]. Well
informed people tend to have a solid opinion and position
in various questions and will distinguish difference between
two examples, but the others can be influence-prone.
Labeling of an opponent intends to devalue his opinion
with a strong negative association [20]. The most popular
labels refer to the ideas, political organizations, and activists
and form a bias by label [5]. Applying a label influences
on fast image construction, and not rarely leads to image
distortion compared to the rest of the content. For exam-
ple, when conservatives are called ”far right”, it leads to a
negative perception of the whole story, but meanwhile, rad-
icals might obtain more positive labels, and the picture will
change correspondingly.
Framing analysis can be classified as followed [26]: (1)
framing and agenda-setting; (2) qualitative (text-based) or
quantitative (number-based); (3) manual or computer-based
frame coding; (4) inductive or deductive approaches.
Framing and agenda setting. Frames, depending on the
forms of the devices it consists of, represents either framing
or agenda setting. The main difference lies in their func-
tions: agenda-setting tells people what to think about, while
framing influences how to think about[28]. In other words,
agenda-setting provides a topic and and themes that news
cover. Framing plays a role of a second-level of agenda-
setting, and therefore, describes reasons of news elements’
salience and provide interpretation of news stories.
Qualitative and quantitative analysis. A typical framing
analysis is based on the qualitative analysis, where coders
derive a frame out of the set of articles. A coder is a so-
cial scientist who reads news articles, highlight the most
significant information, find actors and actions, define the
intonation, metaphors, and lexical choices in the text, and
then group the obtained frame elements in the way of gen-
eralizing and summarization [46]. The overall process tends
to be slow and applicable only to a small number of articles.
Unlike qualitative analysis, quantitative analysis describes
statistical word properties and interrelation between words.
A very basic approaches rely on word frequency, but the
general idea is to convert text into numerical values and
apply following transformations and calculations.
Manual and computer-based frame coding. Advantages
and disadvantages go along with both approaches for man-
ual and computer-based frame coding. A lack of ”fully de-
veloped operational procedures” with respect to ambigu-
ous frame construction was named in [33] as a reason why
computer-based approaches were impossible. Unfeasible in
framing could be also a problem that includes metaphors
and turns of speech, chosen as frame devices [45, 27], which
include hidden meaning and require human interpretation.
Nonetheless, doing manual framing analysis has its own
drawbacks. Firstly, it is time consuming, and it is hard
to analyze a huge scale of texts. Second, framing analysis
can be biased and depends on the psychological and cul-
tural background of a researcher. Thus, computer-assisted
analysis tends to be more objective and reliable [41].
The analysis conducted with computer-based approach
can be repeated by other researches, and the obtained results
based on the algorithms applied several times will be the
same, which is a concern to manually coded results. More-
over, for computer-assist approach [10] states that it is im-
portant to predefine a universe of words describing a frame:
it simplifies word selection during framing and makes the
vocabulary more specific to texts’ topic, thus, more likely to
correctly find a frame, at least in deductive analysis.
Inductive and deductive approaches. Framing analysis
can be also described as two approaches: inductive and de-
ductive. Inductive approach tries to reconstruct frames from
a given set of texts. It aims at finding both a general con-
veyed idea and its framing attributes. In deductive analysis
a frame is predefined and the task is to search for the framing
devices to confirm a presence of a frame[10].
3. RELATED WORK
Though framing analysis is typically a task performed
manually by social scientists, for the past years the research
into computer-assist approaches is growing. It is important

to highlight a term ”assisted”, because currently a lot of
manual interaction exists, and some steps in the analysis
rely on researcher’s knowledge and experience how to use
or interpret results. Nevertheless, the following section aims
at giving an overview of utilized inductive and deductive
approaches.
3.1 Inductive analysis
Inductive analysis involves procedures aiming at recon-
structing or deriving frames from a set of given texts by find-
ing similar features that in Section 2 are defined as framing
devices. Usually there is no prior information given except
of criteria for news articles preselection.
3.1.1 Most frequent words
The term ”salient word”in the context of content analysis,
usually describes words with a high frequency in the text.
In framing analysis, originated from content analysis, some
approaches suggest that as long as a word reminds frequent,
it is a key word and influences on content and perception
of the presented there information. The described methods
are based on this words’ property, but apply various post
processing operations.
PCA on cosine similarity matrix
One of the first scholars who suggested and performed computer-
assisted framing analysis was [29], and he called this ap-
proach frame mapping. The scholar wanted to analyze the
frames of 2 competing group of stakeholders, and how they
are covered in the news and which terms and word choice are
associated with them. The suggested method mainly relies
in frame derivation based on clustering. The VBPro soft-
ware family was used in the analysis. Before the analysis,
several preprocessing steps were applied. Stopwords were
removed before the word list with corresponding frequen-
cies was obtained for each document. The most frequent
words were selected as candidates for frame terms, their
number was based on expertise of a researcher. Then, terms
with the same root were combined into one ”word term”.
This transformation is similar to stemming, but addition-
ally, these ”word terms”included synonyms from a manually
developed dictionary. If a term had several meanings and
some of them are applied to a word within a text, then this
word was removed, or manually tagged via VBPro to distin-
guish the meanings. Because the aim of the research is to
identify specific terms that co-occur with each stakeholders,
the researches marked manually each document by adding a
new word with a special symbol into the word list . The last
step was to compute the cosine similarity matrix between all
pairs of documents. If terms always co-occurred, a matrix
value resulted in 1, and 0 otherwise.
An eigenvalue extraction aimed at determining the most
frequent pattern of co-occurrences between words. The re-
sult of this analysis steps was a list of words and eigenvectors
associated with it, and applying multidimensional space the
values can be plotted on the concept map. The words that
co-occur together, are plotted closer to each other. The large
quantity of terms led to a very dense plot, and therefore,
the words were clustered using a agglomerative hierarchical
clustering method with cosine similarity as a distance met-
ric. On each iteration the algorithm grouped pairs of similar
objects until it reaches one unite cluster. The words, which
formed the upper-level hierarchies, named obtained frames.
Figure 1: Results of inductive frame derivation conducted
by hierarchical clustering [29]
The authors suggested labels inserted into the texts will be
described by regular words, if these labels are clustered on
the early stages. Figure 1 shows the obtained clusters, which
refer to frames, words that are contained in the clusters, and
eigenvector values used later to plot frames in 3D space.
The inserted labels also had own clusters and represent-
ing eigenvector values. Figure 2 shows two resources, rep-
resented by Property-Owner Advocates and Conservation
Advocates, and frames which proximity to resources depicts
how each resource is framed.
Figure 2: Derived frames and their comparison to the stud-
ied resources [29]
The obtained results present frames more as agenda-setting,
describing each resource with respect to the topics that each
source or actor of the discussion chose to support his po-
sition. However, the results do not concentrate of lexical
choice of the frames and if a frame has a positive, negative
or neutral influence on perception of the topic or actor.
PCA on covariance matrix. [12] conducted a similar computer-
based analysis on the biotechnology and GMO products.
With a goal a frame reconstruction, their comparison, and
analysis of lexical choice. The main idea of use the most
frequent words came from [29], and for the analysis they
used WordStat and SPSS software. They also used manual

selection of top most frequent words with respect to their
meaningfulness, interpretability, and absence of ambiguity,
and the number was limited to 130 words. Having list of fre-
quencies of preselected words pro document, the researches
computed 2 covariance matrices for each newspaper. Princi-
pal component analysis with a varimax rotation was applied
as a tool, which searched for the relationship between words
and also showed the latent structure. Eigenvalues higher
than 1 resulted in 8 meaningful frames both for Missouri
and Northern California newspapers. If a word had a load-
ing factor greater or equal to 0.3, then this word was a part
of a cluster.
Figure 3: Frames formed by PCA on the most frequent
words between documents [12]
The researches named the frames manually, and compared
the word choice between similar frames in the newspapers.
They stated that the agriculture topic is framed differently
by newspapers from two states, and as seen in the Figure 3
some differences exist (6 out of 8 frames are the same), but
it is hard to discuss how a specific word choice changed a
perspective to the issue. Most of the words are neutral nouns
that describe a frame more as an agenda-setting. The hints
of the word valence can be given by label ”Frankenfoods”,
which refers to GMO plants as ”mutant food”, but the term
lacks of explanation by sentiments, and therefore, obtain
meaning only with help of manual qualitative analysis.
Word co-occurrence
Self-organizing map. Yan Tian and Concetta M. Stewart
conducted news framing analysis for SARS crisis aiming at
finding frames that CNN and BBC applied to cover the topic
[41]. The CatPac (Category Package) software [48], which
was used as an analysis tool, is based on a self-organizing
map — an unsupervised artificial neural network, and aims
at finding semantic relationships between concepts (words)
that are represented as neurons. One concept can have the
same meaning for several words. The similarity of concepts
is based on the word patterns used in the text.
As a preprocessing step, the researches constructed one
list of words represented all articles per one newspaper, the
program selects the range of the most frequent words, re-
moves stop-words and verbs, and the final list of words was
corrected by a researcher. The neural network needs to be
trained with this word list: a sliding window of a selected size
runs over a text, and if two terms co-occur together, a weight
of their similarity is increased w.r.t. learning rate, activation
values for each node (word), and previously obtained weight
[47]. The result is a matrix with trained weights, and in
order to obtain frames a Ward’s hierarchical clustering is
applied. The results yielded two dendogram, where clusters
were named manually.
Figure 4: Result of inductive analysis with Self-organizing
map
The results in Figure 4 depict obtained frames and their
constituents. As already seen in the precious approaches,
extracted examples of word choice represent keywords. They
rather show agenda-setting structure, introducing sub-topics
of the crisis as various perspectives, than answer the question
how of to think about the issue, and what is the difference
in interpretation of the topic.
Semantic network. [23] conducted the analysis on impact
of artificial sweeteners in the media with assistance of the
TextSTAT and the Pajek programs. The authors suggested
a construction of a semantic maps to reveal hidden word
meaning, which they called implicit frames. Texts were rep-
resented as list of 100 most frequent words, and to draw a
similarity measure between words, the scholars calculated
cosine similarity matrices for each period they where inter-
ested in. In order to find constituents of a frame, a threshold
was derived from mean cosine value of lower triangle of the
obtained matrix. Multidimensional scaling can be applied
to reduce dimensionality, but the authors plotted the matrix
as a network, where nodes were words, number of edges —
semantic relations to other words, and length of an edge —
the similarity measure of the words. Spatial proximity of
terms led to clusters or frames depicted in Figure 5.
Based on the Figure 5 it is hard to derive meaningful
frames. It depicts the relation and the obtained results could

Figure 5: An example of semantic network used to form
frames around term ”artificial sweetener” [23]
tell how the words are referred together, but neither the
number of connections, nor their length does not give objec-
tive value how to read it. Moreover, words used for the nodes
describe agenda-setting of this problem in the newspapers,
not the how this information can be interpreted.
3.1.2 Keyword extraction
[42] gave an overview of linguistic instruments for frame
construction with WordSmith [39]. They stated that us-
ing keywords follows the Entman’s description of a frame
of salience of words. ”Keyness” was based on log-likelihood
that calculates the number of occurrences of a given word on
the corpus with respect to a reference corpus, representing
the language of the text [15]. This approach is more sophis-
ticated using just a simple list of most frequent words, and
takes into account context in which the word choice is used.
Nevertheless, keywords were not said to represent a frame,
but helped to establish the key topic of it and central mean-
ing. Word concordance extracted the words on the left and
right from the word, therefore, broadening the context, in
which the keyword was used, and revealing a candidate for
a frame device. Frame contraction itself was based on qual-
itative methods, which formed frame packages suggested by
[45]
Performing a substantial part of the analysis manually
refers this research to the semi-automated approach, but
subjects to statistically confirmed selection of framing de-
vices. Word choice is based on the predefined frame compo-
nents, but a frame coding itself and determination of word
choice of each frame depends on manual coding.
3.1.3 Centering Resonance Analysis
[34] performed a comparative analysis of terrorism cov-
erage in the UK and USA. The scholars used Centering
Resonance Analysis (CRA) [11] on the whole set of docu-
ments. CRA finds the most important words with respect
to their frequency and influential positions. CRA creates a
network of objects and subjects of the text, and therefore,
selects nouns and noun phrases as nodes in this network.
Other parts of the speech, e.g. verbs, are excluded from
main components, but are used to link different nodes. The
researches stated that ”nouns denote conceptual categories
that provide more salience discourse information than verbs”
[11]. A word has high value of betweenness centrality, which
is represented ”by the number of times the words were linked
in the text according to the rules above”. Relative influence
of a word denotes the average number of steps required to
get from a considered node to the other nodes in network,
and is calculated as
Ii
T
=
X
j<k
gjk(i)/gjk
[(N − 1)(N − 2)/2]
where gjk is the number of shortest paths connecting the
jth
and kth
words, gjk(i) is the number of those paths con-
necting word i, and N is the number of words in network.
An example of obtained results of comparison between
two newspapers is shown in Figure 6. Words of each news-
paper are grouped in a way that the more interrelated ones,
appeared closer to each other, yielded frames. Coefficient
of betweenness was used to measure similarity of obtained
frames and also words in a frame, but the researches did
not provide these numbers to compare the results. [21] con-
ducted similar analysis based on CRA, but their results did
not provide any additional input for results comparison.
Figure 6: An example comparison between similarity and
difference of framing terrorism between newspapers of two
countries [34]
The whole analysis is performed on a computer-based ap-
proach, but was evaluated by manual discourse analysis.
The results shows the coverage of topics within the issue.
Some words can even give an overview of news coverage va-
lence, but because some parts of the speech were omitted
and adjectives did not receive enough influence in the net-
work, it is hard to derive specific framing devices that would
describe a frame.
3.1.4 Latent Semantic Analysis
[40] based the detection of frames ”he” and ”she” frame
on Latent Semantic Analysis (LSA). LSA [14] is a method
for Information Retrieval and also frequently used is social
sciences applications. Its key feature is a possibility of de-
tecting the words that represent similar concepts, synonyms,
or semantically related concepts. It represents texts a ma-
trix of words in texts with either word frequencies or TF-
IDF as values. The main objective was to determine how
men and women are represented in the media news in 1996-
1997, which terms are chosen to describe them. The results
are shown in Figure 7, where semantically related terms are
shown in conjunction with cosine similarity.
The scholars wanted to test the valence of the context
around two pronouns, and the expectation that pronoun ”he”
will be more positive than ”she”was confirmed by univariate
analysis of variance (ANOVA), where they used valence as
dependent variable and gender as independent. Also with
this analysis they proved the hypothesis, that pronoun ”she”
will have more gender determined labels, where indeed in the

Figure 7: Results of semantically related words and cosine
similarity values for the defines frames [40]
Figure 7 we see that such labels as ”mother”, ”woman”, ”girl”
have high similarity values and can represent one entity and
its characteristics. All in all, though context of pronoun ”he”
was covered 9 times more frequently than ”she”, the analysis
allowed to compare word instances w.r.t. their coverage.
The interpretation of word valence is based on a qualitative
analysis of the obtained word choice, and not automatically
presented to the users.
3.1.5 Keyword-weight model
The goal of [35] is to solve the problem of media bias and
also to overcome a slant of a frame to narrow a topic to a spe-
cific perception. They proposed a NewsCube system, that
utilized (1) keyword-weight model, and (2) framing cycle-
aware clustering. Keyword-weight model is discussed as an
optimal solution between simple keyword extraction by fre-
quency and complex syntactic and semantic parsing, that
often yields ambiguous results. The architecture of the sys-
tem is depicted in the Figure 8.
Figure 8: NewsCube architecture [35]
The importance of words depends on the news pyramid
structure: the most prominent words and salient informa-
tion will appear in the head, then sub-head, lead, and only
then in main text. Therefore, suggested news structure-
based extraction count not only word frequencies but also
weight them considering the location of a word. The results
Figure 9: ASSIST software architecture [6]
are normalized with respect to the length of the structural
element where the word was found.
For the frame derivation they used the concept of covering
aspect over time in frame cycle. It comes from the property
of news issue, when just appeared, all news media uses the
same source of information covering the basic facts, then ex-
tending it with additional sources. The additional sources
highly depend on the supported bias. The framing cycle-
aware clustering focuses on differentiation the head group of
common articles from the tail group by calculation common-
ness and uncommonness between articles and then splitting
them in the 2D space. Measures of (un)commonness are
based on cosine similarity. Obtained values depend on key-
word weight and the corresponding commonness and un-
commonness of keywords within an article, that describe
how often keywords appear in a considered set of articles.
With respect to the research question keyword-weight model
represents an improved previously observed solutions on sim-
ple most frequent word extraction in Sections 3.1.1. On the
other hand, the approach simplifies the model to syntac-
tic structure, avoiding extraction of semantic information
rather utilizing it as cosine similarity. It allows readers to
have more objective scope of news articles on an issue, but
does not explain or construct frames that change perspective
with a particular word choice.
3.1.6 Named entity recognition and sentiment anal-
ysis
[6] have suggested an ASSIST system approaching fram-
ing analysis problem. Their solution consists of 3 modules:
named entity extraction, term extraction, and sentiment
analysis, thus developing a text mining platform to support
framing analysis. The architecture of the ASSIST software
is represented in Figure 9.
Named entity recognition (NER) allows a user to identify
the main roles in the text. The authors have pointed out
that only roles and locations, typically used in NER, are
not useful for their task, and therefore they expanded the
entities’ types and recognized 26 categories. This resulted
in the better recognition of the main topics or frames in
the text and also semantic annotation of the words, but
also highly depended on the algorithm of a particular NER

implementation. Used in ASSIST BaLIE semi-supervised
NER [30] faced the problem of defining categories for the
same word, but belonging to different entities, e.g. name
and town.
Term extraction is similar to keyphrase extraction func-
tions. It focuses on the salient linguistic elements, that form
the specific topic or concept [19, 4]. These concepts can
also represent word choice and labeling that is selected by
the author of news article to describe an event or situation.
TerMine combined the ideas of term formation patterns and
statistical information of a term. The authors provided the
examples, where labels such as ”Big Brother” and ”DNA
database” were extracted from newspaper articles about ID
cards.
Sentiment analyzer was a HYSEAS software [36], and de-
termined the used words’ tone in the text, and resulted as
a summarized score per sentence. The module complied the
task of determining positive, negative, or neutral effect of
perception of an article, though, the module faced problems
of accuracy especially with misclassified neutral sentences.
The suggested system was developed to aim social scien-
tists to perform semi-automated framing analysis. The out-
puts of modules required further qualitative analysis, and
did not yield a aggregated result analyzing module or visu-
alization, rather provided three disjoint outcomes for further
processing.
3.2 Deductive analysis
Deductive analysis solves a task of determining specified
frames in a set of given texts. Methods either treat a frame
as a whole, or split it into framing devices and try to find
a frame by its parts. For the following approaches news ar-
ticles are already coded, mostly manually, and the task of
computer-assist deductive analysis is to increase the speed
and accuracy of frame search in the remaining scope of ar-
ticles.
3.2.1 Classification
Classification is a Machine Learning (ML) task of identify-
ing a category of observations based on given set with known
membership. In order to train a model, one needs a dataset
with given labels which specify class membership. Then,
roughly following the Knowledge Discovery in Databases
(KDD) pipeline [3], model features are constructed, resulting
a final dataset split into two parts for training and testing
purposes.
Logistic Regression
The task of the research of [8] is to measure how much of
each of 16 frames are present in smoking, immigration, and
same-sex marriage issues. Binary logistic regression clas-
sifiers were trained for each frame resulting all in all in
16 × 3 = 48 classifiers. Binary classifier decides if a frame is
present or not. As a features for the model the researches
suggested to use a binary code for each word in a framing
code book: if a word was found in an article it had value
1, and 0 otherwise. The prediction result is a real value,
and it measures the accuracy or probability of a frame in
a considered article. Classification results for smoking issue
are depicted in Figure 10.
The area under the curve (AUC) is a standard measure of
classification correctness, it ”expresses the probability that
the classifier will rank a positive document above a negative
Figure 10: Accuracy of frame search with logistic regression
for smoking issue[8]
Figure 11: Highest-impact words in classifi
ers for smoking issue[8]
document”[32]. The researches used it to prove that logistic
regression in the approach performed better than a random
classifier, and showed that some devices have more promi-
nent words and are easier distinguished than the others.
The most significant words of the best detected frames can
be specified. In order to do this it is suggested to calculate
a product of all learned coefficients for a feature (word) and
feature’s average value in the testing set. This value favors
frequent terms and lowers influence of infrequent ones. Fig-
ure 11 depicts an example of tree most represented frames
and their framing devices for smoking issue.
The results confirms the importance well defined frame
devices and the code book. The number in brackets in Fig-
ure 10 shows the prevalence of each frame in the texts, which
is used to depict frames in the current order, and there is no
obvious dependency between frame prediction and its preva-
lence. Therefore, the accuracy of classifier increases, if the
word choice must correctly represent a frame.
Ensemble of logistic regressions
[32] performed deductive analysis as a classification task
based on a ensemble of logistic regression functions. They
represented each article as a bag-of-word model and calcu-
lated TF-IDF scores for each term. Instead of a TF compo-
nent they applied sub-linear frequency scaling of 1+log(TF)
and used l2 normalization for IDF weights by adding 1. Each
frame had 3 to 4 framing devices, and each article in train-
ing and testing sets was manually labeled 1 or 0 if a fram-
ing device was found in a text or not. Factor analysis on
framing devices described, which ones are more influential,
and, as can be seen in Figure 12, framing devices C2 and
E2 were discarded due to low factor loading and coherence
with frames.
Manually annotated documents (5875) were used for train-
ing and testing purposes based on ten-fold cross validation.
To measure accuracy of the results the researches applied
two metrics: agreement between coders or classifiers in a en-

(a) Intercoder and classificator agreement
(b) AUC for ROC automated classification prediction
Figure 12: Results of frame identification with human cod-
ing, single logistic classifier and ensemble of classifiers for
each of identifier questions[32]
semble and AUC. The results for both metrics are depicted
in Figures 12a and 12b correspondingly.
The classificator ensemble showed better results compared
to a single classifier, and also comparison to a small manually
annotated data set (randomly chosen coders annotated 159
articles multiple times, approximately, 3% of overall number,
to calculate inter-coder agreement) yield somewhat better
prediction in 7 of of 11 framing devices. Nonetheless, some
framing devices are better classified than others. AUC in-
dicated the ensemble is more accurate in prediction by 20%
points in average.
The approach revealed accurate prediction results for some
framing devices, and poor for the others. The scholars de-
fined it as a problem of ambiguous interpretation of formu-
lated frames and complex message characteristics. Compre-
hensive word choice and explained conceptions would help
to improve the results.
Similar measures applied [9], and discoursed performances
of holistic approach when frame is coded in a general and
indicator-based approach, which is frequently used for man-
ual deductive coding and consisted of searching a frame by
determining presence or absence of its attributes. Addition-
ally to AUC and Krippendorff’s Alpha (KA) for inter-coder
agreements, they stated that accuracy (AC) of coincidence
between manual and computer-assist coding must be used.
All three values are used together and evaluated the correct-
ness of a frame found in a text.
The results showed that after representing documents with
TF-IDF in classification task the holistic approach performed
better than the indicator-based one. Moreover, with some-
what drop of accuracy supervised machine learning (SML)
algorithms can be noticed in a set of documents if it orig-
inated another resource than the training set. The exper-
iment showed results ranged from 0.79 to 0.96 for generic
frames such as conflict, economic consequences, morality,
and human interest, on which the scholars concluded that
SML is suitable for issue specific deductive analysis.
Support Vector Machines
[18] attempted to identify different word choice between four
news outlets. For the analysis they removed stop words, ap-
plied porter stemmer, and extracted unigrams, bigrams, and
trigrams of words to represent texts as bag-of-word model
based on TF-IDF. To find similar articles or mates, they
used the Best Reciprocal Hit (BRH) algorithm that works
similar to bioinformatics algorithms that identify similar
genes. Cosine similarity yielded the resulting vector space
model, and allowed to extract top n nearest-neighbors for
each document. The scholars also applied labels of article
source for the training data.
A linear SVM classifier was trained and tested with ten-
fold cross validation, and the performance was measured by
the break-even-point (BEP) ”which is a hypothetical point
where precision (ratio of positive documents among retrieved
ones) and recall (ratio of retrieved positive documents among
all positive documents) meet when varying the threshold”[18].
One linear classifier was used for one pair of news outlets.
The researches stated, that BEP reflects a measure of sep-
arability between news outlets’ lexical choice. Figure 13
depicts BEP values with corresponding 2D MDS represen-
tation and example of most influential words with highest
TF-IDF score for two news outlets.
Figure 13: BEP metric used to compare word choice of news
outlets[18]
Though the result yielded separable clusters of most spe-
cific words for each outlet, the approach is rather straight-
forward and applied SVM, which generally performs well for
text classification problem[24]. Moreover, the problem ad-
dressed as comparison of words choice between news outlets,
but not search of frames within them.
Supervised Hierarchical Latent Dirichlet Allocation
Usually topic modeling (LDA) is utilized as an approach for
inductive framing analysis, but [31] suggested his Supervised
Hierarchical Latent Dirichlet Allocation (SHLDA) method
based on LDA that aims at revealing not only agenda-setting
constituent, but also how the issue is framed and how people
speak about it. The author describes frame as a second level
of agenda-setting, containing the latent meaning of the text.
LDA is also known as topic modeling technique and is used
to assign words from texts to abstract
SHLDA is a supervised method that reveals topic hierar-
chies and represent a frame with the most probable words.
Each document is assigned to response value (label or nu-

Figure 14: Results of SHLDA algorithm: formed frames and
their development and polarization over time by each polit-
ical party [31]
Figure 15: Defined polarity of words derived by SHLDA [31]
meric value) that represents the author’s perspective, e.g.
sentiment or ideology. A text is represented as a bag of sen-
tences and a sentence is a bag of words. Each word then is in
iterative way assigned to a frame it represents, and output
of the model is ”a hierarchy of topics which is informed by
a label”[31].
Figure 14 depicts the result of the analysis. The lower lev-
els of topic hierarchy were more specific about word choice
and have two political parties framed the issue according
to their interest. The concept of the approach is similar to
previously described research with SVM, when the analysis
lies between deductive and inductive methods: on the one
hand, a label or frame is pregiven, and we do not need to
derive it, but on the other hand, framing devices of a frame
are unknown, and the task is to find them. Apart from
topic hierarchy, the model provides for each word a lexical
regression parameter, where highest values show positive as-
sociation of a word, and lowest values — negative. A lexical
regression parameter allows to analyze results not only on a
topic level, but specify valence of each word (Figure 15).
3.2.2 Hierarchical clustering
[27] for their deductive analysis for biotechnology issue
proposed a clustering method. They described biotechnol-
ogy frame with framing devices suggested by Entman: prob-
lem definition, casual attribution, moral evaluation, and treat-
ment. Manual inductive analysis relied on existing code-
books for this topic, and filled framing devices’ values with
text elements obtained from set of texts. It resulted to a
specific code-book for the particular research.
The idea of the suggested deductive method is to man-
ually code the presence/absence of each attribute element
for each article, use intercoder agreement for model fea-
tures, and then cluster the elements in Ward hierarchical
way. Clustering of the small frame elements revealed hid-
den connections between them, and enabled to derive differ-
ent frames for various time periods, determining in parallel
significance of each frame.
Though, all coding for frame’s elements the authors per-
formed manually, as well as the meaningful names for the
obtained frames from clusters, the approach allowed to ob-
tain 3 clusters or 3 frames and statistically supported by
heterogeneity measures.
3.2.3 Homogeneity analysis
[45] performed framing analysis on the topic of refugees
in Belgium, where they wanted to find and measure pres-
ence of derived victim and intruder frames in two newspa-
pers. The author suggested a qualitative method of frame
derivation that he formalized later in [46], and applied it in
inductive analysis. The result of it is a code-book with spec-
ified framing and reasoning devices and particular values for
each frame. A part of the code-book is shown in Figure
16, where numbers in brackets represent framing devices for
victim-frame and letters — for intruder-frame.
Figure 16: A sample frame matrix with framing devices [45]
The coders needed to indicate whether framing devices ap-
peared or not. 2-3 coders worked on these articles, therefore,
Cohen’s kappa was applied to calculate intercoder reliability
of each device. Devices with values less than 0.6 were dis-
carded. To determine which devices had greater influence,
the scholars relied on homogeneity analysis by means of al-
tering squares (HOMALS), which has a similar meaning as
MDS and projects values onto lower dimensional space pre-
serving degree of significance of each value. Framing devices
plotted in two-dimensional space close to each other indi-
cated similarity of articles. Figure 17 depicts the result of
HOMALS analysis. Clusters formed by framing devices are
clearly separable: intruder-frame devices occupied the left
side of the plot, whereas victim-frame — the right side.
The authors described the separation into upper and lower
part in the Figure 17 as distinction between journalists’ per-
spective: bottom part considered to have in-group attitude
using pronouns such as we/our while talking about the issue,
whereas the top part have out-group position, addressing the

Figure 17: Homogeneity analysis of framing devices between
intruder-frame (left) and victim-frame (right) [45]
Figure 18: Average scores for intruder-frame and victim-
frame representing the difference between coverage in
Flemish-language (orange) and French-language (green)
newspapers [45]
problem with pronouns they/their.
For further step the scholars analyzed the devices that
we positioned in the center of the plot. Qualitatively they
derived that these devices are used interchangeably within
articles, and therefore, could not be used to clearly separated
into one of the clusters. In order to measure ”how much” of
each frame is detected in an article, they suggested to use
Dimension 1 to scale influence of each particular framing
device and sum the influence into resulting index. Conse-
quently, each article had two indices: for intruder-frame and
victim frame. Being grouped by news outlet, these values are
depicted in Figure 18, where a green half-circle represents
French-language newspapers and an orange one — Flemish
ones.
Results show that 50% of frames were shared between
Flemish-language and French-language newspapers, and if
French-language newspapers tend to refer to the victim-
frame, Flemish-language outlets use usually both.
Figure 19: Framing devices of Israel’s position in Hamas
conflict [44]
Important criteria of word choice is to cover exclusively
the perspective of each particular frame. The paper desig-
nated that framing devices being well-formulated allow to
compare word choice in a quantitative way. For example,
apart from newspaper comparison suggested index allows to
analyze how each frame was referred to during some time
period.
3.2.4 Semantic Network Analysis
[44] conducted deductive analysis based on Semantic Net-
work Analysis (SNA). They represented sentences as seman-
tic statements, forming a network. The core object is a
predicate representing an event. An event has a subject
(actor of the action), object (the actor at whom the action
was addressed), and the source (represents the origin of this
event or statement). The sentences are parsed with SNA
into semantic constituents, and having the overall semantic
network of words and their roles, analysis of the text looks
like search on the network. Figure 19 depicts an example of
the network.
Clearly, framing devices also consist of subject, predicate,
and object, therefore, the search for the frame will be done
as a rule-based search for the elements of framing devices.
Based on this, the overall structure of the analysis comprises
(1) syntactic analysis with parsing the text, (2) role extrac-
tion for the words in a sentence, (3) determining frames by
relation extraction in a rule-based way from the obtained
network that specify framing devices.
For the evaluation the authors coded 100 testing sentences
manually, and to discuss the correspondence of manual and
automated coding used inter-coding reliability values such as
Cohen’s kappa and Krippendorff’s alpha in conjunction with
F1 score. Because the values of inter-coding reliability are
not strongly defined, the inference of this comparison has
no solid basis, but, nevertheless, the authors believe that
results are quite good based on the F1 score ranged between
0.71 and 0.83.
4. DISCUSSION
Originally framing analysis was a part of content analysis.
Researches from social sciences manually conducted it based
on the qualitative methods for text analysis, but recently
quantitative methods has become more popular instrument

[26]. Nevertheless, some researches argue about the quality
of the quantitative analyses and point on the weakness of
current text analysis methods in terms of distinguishing the
hidden meaning between lines.
Methods for computer-assist framing analysis exist both
for inductive and deductive analyses. Their summary and
additional information we present in Appendices A and B.
Some considered inductive approaches are still mostly based
on agenda-setting properties of a frame, rather than on fram-
ing itself. Framing attributes, being usually nouns, describe
what to think about (topic), but not how to think about it
(interpretation).
Frames are defines as salient piece of information, and this
significance comes from how frequently media or politicians
refer to a specific concept. A simple interpretation of it lies
behind most frequent words in the text corpus, which tend
to be a common starting point for the inductive analysis
(see Section 3.1.1). On the one hand, this approach follows
idea of word salience, but on the other hand, it misses the
semantic connection between neighboring words.
Several approaches compared news articles based on word
choice and labeling of each news outlet. They extracted sig-
nificant lexical elements based on syntactical properties of
texts (Section 3.1.3), semantic (Section 3.1.4), or both (Sec-
tions 3.1.5 and 3.1.6). All extracted information designated
the difference of the word choice, but demanded extra quali-
tative interpretation of the words’ polarity and interrelation.
Moreover, the analyses did not intend to find how a percep-
tion of a same entity or concept changes depending on the
word choice, rather searching for non-intersecting group of
words.
Frame construction regarding to word choice and labeling
should have the following requirements:
1. identify the main elements of a text, which could be
answers to the 5W1H questions (see Section 2) or other
interesting terms of a text;
2. distinguish and separate different frames based on the
word choice of each category:
3. find accompanying words that form frame perception,
i.e. sentiments.
Requirement 1 can be fulfilled with keyphrase extraction
or Named Entity Recognition techniques, while for require-
ment 2 a specific similarity measure should be applied to
identify semantically related words, and requirement 3 can
be addressed by sentiment analysis.
Deductive analysis approaches rely on classification meth-
ods (Section 3.2.1), clustering (Section 3.2.2), statistical (Sec-
tion 3.2.3), or rule-based (Section 3.2.4). All methods de-
pend on various properties of already derived frames and
how they represent texts. Texts usually are represented as a
table with attributes. Attributes (or features) are coded in a
binary way or with TF-IDF. [37] suggested several features
for text processing task, especially classification, that could
be also tried in framing deductive analysis.
A coded set of articles remains to be the most utilized
approach in the deductive analysis. Especially prior coded
articles are required for a classification task. Typically, ar-
ticles are coded manually, and intercoder agreement is used
for classes results to good performance. A significant prop-
erty of frames is high quality of word choice that exclusively
represents framing devices. But a small amount of manu-
ally coded articles remains a problem for classification, as,
roughly saying, the bigger training data set is, the better
classification results are. To enable fully automated coding
the following steps are required: (1) formulating code-book’s
attributes as keywords and then search on them in the text,
(2) extract keywords from the text and calculate similarity
measures with code-book’s attributes.
Solution 1 could be address with concept search (C-Search)
[22], where the main idea is to use semantic search and if
failed then switch to syntactic search. It is Information re-
trieval (IR) technique, and evaluation showed the results
better than simple keyword search. For the solution 2 key-
word extraction techniques can be applied, i.e. [7]. Well-
coded frames regarding word choice of each frame attribute
are an obligatory requirement for any applied afterwards de-
ductive approach.
5. CONCLUSION
In this work we provided a comprehensive literature re-
view on the existing computer-assist framing analysis ap-
proaches, which include inductive and deductive analyses.
Considered approaches are semi-automated and usually in-
clude human interaction in a analysis pipeline. Moreover,
only few methods are focused on word choice and labeling,
and, to the best of our knowledge, none of them address the
problem of identification of word choice of a single concept.
Current work also suggested several ideas that could solve
the problem of framing by word choice and labeling for both
inductive and deductive analysis.
6. REFERENCES
[1] Honest reporting: Lack of context bias.
http://honestreporting.com/news-literacy-defining-
bias-lack-of-context/.
[2] Media bias in strategic word choice.
http://www.aim.org/on-target-blog/media-bias-in-
strategic-word-choice/.
[3] Sigkdd: The community for data mining, data science
and analytics. http://www.kdd.org/.
[4] Termine software.
http://www.nactem.ac.uk/software/termine/.
[5] D. S. J. Allen. Media bias: 8 types [a classic, kinda],
2015.
[6] S. Ananiadou, D. Weissenbacher, B. Rea, E. Pieri,
F. Vis, Y. Lin, R. Procter, and P. Halfpenny.
Supporting frame analysis using text mining. In 5 th
International Conference on e-Social Science, 2009.
[7] A. Bougouin, F. Boudin, and B. Daille. Topicrank:
Graph-based topic ranking for keyphrase extraction.
In International Joint Conference on Natural
Language Processing (IJCNLP), pages 543–551, 2013.
[8] A. E. Boydstun, D. Card, J. Gross, P. Resnick, and
N. A. Smith. Tracking the development of media
frames within and across policy issues. 2014.
[9] B. Burscher, D. Odijk, R. Vliegenthart, M. De Rijke,
and C. H. De Vreese. Teaching the computer to code
frames in news: Comparing two supervised machine
learning approaches to frame analysis. Communication
Methods and Measures, 8(3):190–206, 2014.
[10] D. Chong and J. N. Druckman. Framing theory.
Annu. Rev. Polit. Sci., 10:103–126, 2007.

[11] S. R. Corman, T. Kuhn, R. D. McPhee, and K. J.
Dooley. Studying complex discursive systems. Human
communication research, 28(2):157–206, 2002.
[12] C. E. Crawley. Localized debates of agricultural
biotechnology in community newspapers: A
quantitative content analysis of media frames and
sources. Science Communication, 28(3):314–346, 2007.
[13] P. D’Angelo and J. A. Kuypers. Doing news framing
analysis: Empirical and theoretical perspectives.
Routledge, 2010.
[14] S. T. Dumais. Latent semantic analysis. Annual review
of information science and technology, 38(1):188–230,
2004.
[15] T. Dunning. Accurate methods for the statistics of
surprise and coincidence. Computational linguistics,
19(1):61–74, 1993.
[16] R. M. Entman. Framing: Toward clarification of a
fractured paradigm. Journal of communication,
43(4):51–58, 1993.
[17] R. M. Entman. Projections of power: Framing news,
public opinion, and US foreign policy. University of
Chicago Press, 2004.
[18] B. Fortuna, C. Galleguillos, and N. Cristianini.
Detection of bias in media outlets with statistical
learning methods. Text Mining, page 27, 2009.
[19] K. Franzi, S. Ananiadou, and H. Mima. Automatic
recognition of multi-word terms. International Journal
of Digital Libraries, 3(2):117–132, 2000.
[20] e. G. Axtell, A.Bartley. Radford University Core
Handbook. Radford University.
[21] D. M. Garyantes and P. J. Murphy. Success or chaos?
framing and ideology in news coverage of the iraqi
national elections. International Communication
Gazette, 72(2):151–170, 2010.
[22] F. Giunchiglia, U. Kharkevich, and I. Zaihrayeu.
Concept search: Semantics enabled syntactic search.
2008.
[23] I. Hellsten, J. Dawson, and L. Leydesdorff. Implicit
media frames: Automated analysis of public debate on
artificial sweeteners. Public Understanding of Science,
19(5):590–608, 2010.
[24] T. Joachims. Text categorization with support vector
machines: Learning with many relevant features. In
European conference on machine learning, pages
137–142. Springer, 1998.
[25] D. Kahneman and A. Tversky. Choices, values, and
frames. American psychologist, 39(4):341, 1984.
[26] J. Matthes. What’s in a frame? a content analysis of
media framing studies in the world’s leading
communication journals, 1990-2005. Journalism &
Mass Communication Quarterly, 86(2):349–367, 2009.
[27] J. Matthes and M. Kohring. The content analysis of
media frames: Toward improving reliability and
validity. Journal of communication, 58(2):258–279,
2008.
[28] M. E. McCombs and D. L. Shaw. The agenda-setting
function of mass media. Public opinion quarterly,
36(2):176–187, 1972.
[29] M. M. Miller. Frame mapping and analysis of news
coverage of contentious issues. Social Science
Computer Review, 15(4):367–378, 1997.
[30] D. Nadeau, P. Turney, and S. Matwin. Unsupervised
named-entity recognition: Generating gazetteers and
resolving ambiguity. 2006.
[31] V.-A. Nguyen, J. L. Boyd-Graber, and P. Resnik.
Lexical and hierarchical topic regression. In Advances
in Neural Information Processing Systems, pages
1106–1114, 2013.
[32] D. Odijk, B. Burscher, R. Vliegenthart, and
M. De Rijke. Automatic thematic content analysis:
Finding frames in news. In International Conference
on Social Informatics, pages 333–345. Springer, 2013.
[33] Z. Pan and G. M. Kosicki. Framing analysis: An
approach to news discourse. Political communication,
10(1):55–75, 1993.
[34] Z. Papacharissi and M. de Fatima Oliveira. News
frames terrorism: A comparative analysis of frames
employed in terrorism coverage in us and uk
newspapers. The International Journal of
Press/Politics, 13(1):52–74, 2008.
[35] S. Park, S. Kang, S. Chung, and J. Song. Newscube:
delivering multiple aspects of news to mitigate media
bias. In Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems, pages 443–452.
ACM, 2009.
[36] S. Piao, Y. Tsuruoka, and S. Ananiadou. Hyseas: A
hybrid sentiment analysis system. In Proceedings of
the Fourth International Conference on
Interdisciplinary Social Sciences, 2009.
[37] P. Przybyla, N. T. Nguyen, M. Shardlow,
G. Kontonatsios, and S. Ananiadou. Nactem at
semeval-2016 task 1: Inferring sentence-level semantic
similarity from an ensemble of complementary lexical
and sentence-level features. Proceedings of SemEval,
pages 614–620, 2016.
[38] M. Recasens, C. Danescu-Niculescu-Mizil, and
D. Jurafsky. Linguistic models for analyzing and
detecting biased language. In ACL (1), pages
1650–1659, 2013.
[39] M. Scott. Wordsmith tools 6. Oxford: Oxford
University Press, 2011.
[40] M. G. Sendén, S. Sikström, and T. Lindholm. ”she”
and ”he” in news media messages: pronoun use reflects
gender biases in semantic contexts. Sex Roles,
72(1-2):40–49, 2015.
[41] Y. Tian and C. M. Stewart. Framing the sars crisis: A
computer-assisted text analysis of cnn and bbc online
news reports of sars. Asian Journal of
Communication, 15(3):289–301, 2005.
[42] M. Touri and N. Koteyko. Using corpus linguistic
software in the extraction of news frames: towards a
dynamic process of frame analysis in journalistic texts.
International Journal of Social Research Methodology,
18(6):601–616, 2015.
[43] G. Tuchman. Making news: A study in the
construction of reality. 1978.
[44] W. van Atteveldt, T. Sheafer, and S. Shenhav.
Automatically extracting frames from media content
using syntacting analysis. In Proceedings of the 5th
Annual ACM Web Science Conference, pages 423–430.
ACM, 2013.
[45] B. Van Gorp. Where is the frame? victims and
intruders in the belgian press coverage of the asylum

issue. European Journal of Communication,
20(4):484–507, 2005.
[46] B. Van Gorp. Strategies to take subjectivity out of
framing analysis. Doing news framing analysis:
Empirical and theoretical perspectives, pages 84–109,
2010.
[47] J. Woelfel. Artificial neurol networks in policy
research: A current assessment. Journal of
Communication, 43(1):63–80, 1993.
[48] J. Woelfel and N. Stoyanoff. Catpac: A neural
network for qualitative analysis of text. In annual
meeting of the Australian Marketing Association,
Melbourne, Australia, 1993.

Method name,
source
Information about the
data
Preprocessing Methods Output Résumé
PCA on cosine
similarity matrix
[28]
• Associated Press dispatches
• 12.07.1984 – 27.06.1995
• 1465 articles
• Keyword: wetlands
• Stop words and ambiguous words
manually removed
• Number of most frequent words
chosen by authors
• 1 document = 1 list
• Cosine similarity matrix of most
frequent terms co-occurrence
• PCA
• Hierarchical clustering on 3
eigenvector values
• Table with frames’ names and
corresponding framing devices
• Visualization of frames in 3D
space
• Semi-automated
• Agenda setting
PCA on covariance
matrix
[11]
• Lexis-Nexis database
• 1.01.1992 – 1.12.2004
• 1156 articles
• Keywords: GMO, agricultural
biotech*, etc.
• Stop words and ambiguous words
manually removed
chosen by authors
• PCA of most frequent words with
varimax rotation,
• 8 most meaningful eigenvalues
selected
• Terms with loading ≥ 0.3 form a frame
• Table with grouped framing
devices
• Qualitative analysis to interpret
results and name frames
required
• Semi-automated
• Agenda setting
Self-organizing map
[17]
• CNN and BBC websites
• 1.03.2003 – 1.09.2003
• 730 articles
• Keywords: SARS
• Stop words and verbs removed
• Top 40 of most frequent words are
manually ranked
• Self-organizing map -unsupervised
neural network – of most frequent
words
• Hierarchical clustering based on
Ward’s method
• Table with grouped framing
devices
results and name frames
required
• Semi-automated
• Agenda setting
Semantic network
[22]
• New York Times website
• 1980 – 2006
• 54 articles
• Keywords: artificial
sweetener, etc.
• Data normalization
• Stop words removed
• All document for 1 topic = 1 list
chosen by authors
• Cosine similarity matrix of most
frequent terms co-occurrence
• Elements ≥ threshold form a network
• Normalize similarity
• Graph visualization
• Frames are obtained by visual
interpretation
• Semi-automated
• Agenda setting
Keyword extraction
[41]
• 01.2010 – 08.2010
• 40 articles
• Keywords: Greece, economy,
debt, crisis, etc.
• Bag-of-words of an article
• Bag-of-words of all articles
• Log-likelihood ratio calculation • List of concordances of
keywords
• Full qualitative analysis required
• Semi-automated
• Agenda setting
• Word Choice +
labeling
Centering Resonance
Analysis
[33]
• 06.2006 – 06.2007
• 218 articles
• Keywords: terrorist attacks,
Iraq, Israel, Afghanistan
• Only noun-phrases left, other
words form connections
• Pronouns are dropped
• Stemming
• Central resonance analysis – frames
are based on words of the biggest
influence or centrality
• Network of shared/distinct
keywords between frames
• Qualitative analysis for result
interpretation required
• Semi-automated
• Agenda setting
• Word Choice
Latent Semantic
Analysis
[39]
• Reuters website
• 1996 – 1997
• Not specified
• All articles
• Stop-words removed • Latent Semantic Analysis • Ordered list of terms and their
similarity value compared to a
given word
• Qualitative interpretation
required
• Automated
• Framing ±
• Word Choice +
labeling
Keyword-weight
model
[34]
• “20 most publishing news
providers”
• 2007
• 406 articles
• “many important events”
• Bag-of-words
• Stop-words removed
• Structure of text employed
• News Structure-based Extraction
• Keyword-weight model
• Aspect-based Clustering
• A system that shows spectrum
of articles with minimized
framing bias
• Automated
• Framing
• Word choice
Named entity
recognition
[35]
No information given No information given • Named Entity Extraction
• Term Extraction
• Sentiment Analysis
• Output from 3 independent
systems
• Qualitative interpretation
required
• Semi-automated
• Framing
• Word Choice +
labeling
APPENDIX
A. SUMMARY TABLE OF COMPUTER-ASSIST INDUCTIVE FRAMING ANALYSIS METHODS

Method name,
source
Information about the
data
Preprocessing Methods Output Résumé
Logistic Regression
[7]
• 1990 – 2012
• 9502 articles
• Keywords: smoking,
immigration, same-sex
marriage
• Manual frame coding on frame level
• Features: presence or absence of each
word compared to code-vocabulary
• Classes: presence/absence of a frame
• 1 classifier per frame
• Logistic Regression • Obtained prediction is used as a
measure of frame presence
• Visualization
• Trained model
• Semi-automated
• Agenda-setting
• Word choice and
labeling
Ensemble of logistic
regressions
[31]
• Dutch Lexis-Nexis database
• 1995 – 2011
• 5875 articles
• Not mentioned
• Manual frame coding on attribute level
• Features: Bag-of-words + TF-IDF
score
• Classes: presence/absence of an
attribute
• 1 classifier per attribute
• Factor analysis
• Ensemble of logistic regressions
• Obtained prediction is used as a
measure of frame presence
• Table of comparison between
predictions
• Trained model
• Semi-automated
• Framing
• Word choice + possibly
labeling
Support Vector
Machines
[17]
• websites of AJ, CNN, DN,
and IHT
• 31.03.2005 – 14.04.2006
• 21552 articles, 675 left after
preprocessing
• Not mentioned
• Porter stemmer
• Features: Bag-of-words + TF-IDF
score
• Classes: newspapers’ names
• Linear Support Vector Machines • Table with BEP measure
between pair of news outlets
• 2D visualization of MDS
representation
• List of most different word
choice
• Trained model
• Automated
• Framing
• Word choice + labeling
Supervised
Hierarchical Latent
Dirichlet Allocation
[30]
• GovTrack
• 109th
US Congress
• 5201 + 3060 turns
• Not mentioned
• Classes: derived label classes from
prior knowledge
• Supervised Hierarchical Latent
Dirichlet Allocation
• Visualized hierarchy of topics
based around given labels
• Automated
• Framing
Hierarchical clustering
[25]
• 1992 – 2001
• 1000 articles
• Keywords: biotech, genetic,
genome, DNA
• Features: intercoder agreement
• Ward hierarchical clustering • Table with attributes and values
showing membership of a
cluster/frame
results and name frames required
• Semi-automated
• Framing
Homogeneity analysis
[44]
• Flemish- and French-language
newspapers
• 20.10.2000 – 29.04.2001 and
1.09.2002 – 31.08.2003
• 1489 articles
• Keywords: refugees/asylum-
seekers
• Intercoder agreement calculation –
attributes < 0.6 are discarded
• Features: intercoder agreement
• Homogeneity analysis of framing
attributes
• Index of frame presence calculation
• Visualization of homogeneity
analysis
• Visualization of comparison
between newspapers
• Visualization of frames’
coverage dynamics
some results required
• Semi-automated
• Framing
Semantic Network
Analysis
[43]
• 20414 articles
• 27.12.2008 – 20.01.2009
• Keywords: gaza
• Parsed sentences of attributes and texts • Rule-based Semantic Network
Analysis
• Table with measures for each
attribute being found
• Trained model
• Semi-automated
• Framing
B. SUMMARY TABLE OF COMPUTER-ASSIST DEDUCTIVE FRAMING ANALYSIS METHODS

Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labeling

Recommended

Recommended

More Related Content

Similar to Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labeling

Similar to Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labeling (20)

More from Anastasia Zhukova

More from Anastasia Zhukova (11)

Recently uploaded

Recently uploaded (20)

Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labeling