SlideShare a Scribd company logo
1 of 15
Download to read offline
Putting News in a Perspective: Framing by Word Choice
and Labeling
Anastasia Zhukova
University of Konstanz
Universität str., 10
Konstanz, Germany
anastasia.zhukova@uni-konstanz.de
ABSTRACT
While following the news, one can notice the same story can
have different impact depending on which news agent tells
it. One reason for this is how the facts are framed. Fram-
ing is described by communication sciences as an instrument
influencing on how people perceive, interpret and convey in-
formation. It can be obtained by use of specific word choice
and labeling that describe event or problem from a partic-
ular perspective, e.g. positive or negative. In order to de-
rive a frame, social sciences usually perform a manual qual-
itative analysis, but recently a computer-assist quantitative
approaches commence to be an essential way of conducting
framing analysis. This work provides a literature review on
the existing frame derivation methods based on problem of
word choice and labeling.
Keywords
Frame analysis, news analysis, word choice, labeling
1. INTRODUCTION
Consider reading news headlines reporting on the same
event from July, 2015 incident, when Palestinians stoned
a car with several police officers. The following two leave a
contradictory impression: Irish Times: ”Palestinian protester
shot dead in West Bank” and Reuters ”Israeli officer kills
stone-throwing Palestinian youth in West Bank”[1]. Reuters’
headline provides subject, action, object, and reason of the
performed action, Irish Times applies a label ”protester” to
a youth and switched the focus of the story towards the ob-
ject, resorting to passive voice and omitting the subject of
the story. Word choice of ”shoot dead” compared to ”kill”
may also refer to visual associations and change the percep-
tion of the story. The overall impression can lead a reader
to interpretation of the Irish Times in more negative way
compared to Reuters. Different news framing is a reason.
Framing is a conceptualization of the way how people or-
ganize, perceive, and communicate information. It is an
instrument of political science that communicates ideas and
messages, and ”defines issues” [46]. The term framing was
first introduced by Tuchman [43]. A frame is a framing en-
tity and is defined as a strong communicative tool, that in a
compact way unites a set of ideas that need to be transferred
to people, including problems, judgments, actions, causes
and solutions. In [16] Entman suggested a description of
framing ”selecting of some aspects of a perceived reality and
making them more salient in a communicating text, in such
a way as to promote a particular problem definition, casual
interpretation, moral evaluation, and/or treatment recom-
mendation”.
Usually a frame is a pattern of the most important part of
messages that members of political parties, newspapers, or
individuals convey, having an underlined interest. A frame
should be a rather short and simple message, that is a re-
flection a particular event or people [17], and is memorable
and reproducible, which can cause a further spread, use and
evolution.
Politicians use frames to motivate people to act within
a frame’s boundaries and its idea [10], affecting attitudes
and behaviors. Frames convey messages with a symbolic el-
ements, metaphors, utilizing stereotypes, word choice, and
labels that all together consolidate idea in an entity that
[46] calls a package. This package embeds cultural and indi-
vidual perception, that reflects in person’s mind the most.
Specifically chosen words have a great impact on people by
referring to the well known associations, images from the
previous experience and lexical tones.
The definition of a frame varies in its interpretation, and
consequently, social scientist define and apply several meth-
ods for frame derivation. Regardless of specifics, all methods
are either inductive or deductive. In inductive analysis a set
of texts is analyzed in order to define frame as a piece of
message that these texts or news convey[46]. The task is to
find important or according to Entman ”salient”information
either by frequency of influential words, or by words’ impact
on the one’s text perception. Deductive analysis works with
predefined frames and existing code-books which describe a
frame, and determines evidence of a frame being found in a
text.
The main challenge for computer-assisted frame analysis
is finding a frame and its elements, as frame analysis remains
mostly qualitative. Therefore, the research questions of the
paper are (1) how do scholars approach computer-assisted
framing analysis, (2) what are the methods that focus anal-
ysis on constructing or finding frames based on word choice
and labeling.
The paper is organized as follows: we start with giving an
overview of forms of frame analysis and its the general prop-
erties, then review the existing approaches and methods for
finding a frame, and discuss current and possible solutions
for framing w.r.t. word choice and labeling problem.
2. FRAME ANALYSIS
Framing is a process of issue conceptualization and uses
a frame as a tool. A frame a system of organized ideas and
messages, which are called attributes or devices [33]. Each
device consists of highly influential words, forming a struc-
ture acts as a trigger on people, aiming at a specific reaction
based on a particular background: cultural, symbolic and
psychological. [16] discussed information as influential and
important, or as he defined it — ”salient” — first, based on
word frequency. If one wants to emphasize a piece of infor-
mation, it needs to be repeated. Second, the use of familiar
symbols emphasizes perception of the information and trig-
gers fast interpretation of well-known concepts. All these
methods activate a frame and make it prominent [46].
Framing attributes can be associated with a lens, provid-
ing specific boundaries and perspective on a set of views
[13].[33] suggested 4 categories of structures for framing de-
vices:
1. syntactical structure — refers to inverted pyramid in
news discourse and its structural elements, i.e. head-
line, lead, main body, where words and phrases have
influence in the inverted order to the location in the
text — technical devices;
2. script structure — represents a sequence of activities
and components that a single event can consist of, i.e.
5 W and 1 H questions: who, what, when, where, why,
and how — framing devices;
3. thematic structure — defines an article as a theme or
being a subtheme — framing device;
4. rhetorical structure — consists of word and stylistic
choices, metaphors, exemplars, catchphrases, depic-
tions, labels, citations, and visual images, which all
together increase salience of a given point of view and
substitute facts for their interpretation — rhetorical
devices.
The word choice aims at a predefined reaction and inter-
pretation way, thus, influencing the decision making process
[25]. Specific word choice changes the polarity or valence of
the whole story, therefore, framing can introduce bias when
operating with the subjective words and tones [38]. It can be
done addressing people with specific terms and words that
correspond to their background and previous experience.
One concept can be represented by different words, and the
selection of a particular word combination depends on the
aim of a frame. Consider the examples: ”Heart-wrenching
tales of hardship faced by people whose care is dependent
on Medicaid” is very biased towards pity compared to ”In-
formation on the lifestyles of Medicaid dependents”[2]. Well
informed people tend to have a solid opinion and position
in various questions and will distinguish difference between
two examples, but the others can be influence-prone.
Labeling of an opponent intends to devalue his opinion
with a strong negative association [20]. The most popular
labels refer to the ideas, political organizations, and activists
and form a bias by label [5]. Applying a label influences
on fast image construction, and not rarely leads to image
distortion compared to the rest of the content. For exam-
ple, when conservatives are called ”far right”, it leads to a
negative perception of the whole story, but meanwhile, rad-
icals might obtain more positive labels, and the picture will
change correspondingly.
Framing analysis can be classified as followed [26]: (1)
framing and agenda-setting; (2) qualitative (text-based) or
quantitative (number-based); (3) manual or computer-based
frame coding; (4) inductive or deductive approaches.
Framing and agenda setting. Frames, depending on the
forms of the devices it consists of, represents either framing
or agenda setting. The main difference lies in their func-
tions: agenda-setting tells people what to think about, while
framing influences how to think about[28]. In other words,
agenda-setting provides a topic and and themes that news
cover. Framing plays a role of a second-level of agenda-
setting, and therefore, describes reasons of news elements’
salience and provide interpretation of news stories.
Qualitative and quantitative analysis. A typical framing
analysis is based on the qualitative analysis, where coders
derive a frame out of the set of articles. A coder is a so-
cial scientist who reads news articles, highlight the most
significant information, find actors and actions, define the
intonation, metaphors, and lexical choices in the text, and
then group the obtained frame elements in the way of gen-
eralizing and summarization [46]. The overall process tends
to be slow and applicable only to a small number of articles.
Unlike qualitative analysis, quantitative analysis describes
statistical word properties and interrelation between words.
A very basic approaches rely on word frequency, but the
general idea is to convert text into numerical values and
apply following transformations and calculations.
Manual and computer-based frame coding. Advantages
and disadvantages go along with both approaches for man-
ual and computer-based frame coding. A lack of ”fully de-
veloped operational procedures” with respect to ambigu-
ous frame construction was named in [33] as a reason why
computer-based approaches were impossible. Unfeasible in
framing could be also a problem that includes metaphors
and turns of speech, chosen as frame devices [45, 27], which
include hidden meaning and require human interpretation.
Nonetheless, doing manual framing analysis has its own
drawbacks. Firstly, it is time consuming, and it is hard
to analyze a huge scale of texts. Second, framing analysis
can be biased and depends on the psychological and cul-
tural background of a researcher. Thus, computer-assisted
analysis tends to be more objective and reliable [41].
The analysis conducted with computer-based approach
can be repeated by other researches, and the obtained results
based on the algorithms applied several times will be the
same, which is a concern to manually coded results. More-
over, for computer-assist approach [10] states that it is im-
portant to predefine a universe of words describing a frame:
it simplifies word selection during framing and makes the
vocabulary more specific to texts’ topic, thus, more likely to
correctly find a frame, at least in deductive analysis.
Inductive and deductive approaches. Framing analysis
can be also described as two approaches: inductive and de-
ductive. Inductive approach tries to reconstruct frames from
a given set of texts. It aims at finding both a general con-
veyed idea and its framing attributes. In deductive analysis
a frame is predefined and the task is to search for the framing
devices to confirm a presence of a frame[10].
3. RELATED WORK
Though framing analysis is typically a task performed
manually by social scientists, for the past years the research
into computer-assist approaches is growing. It is important
to highlight a term ”assisted”, because currently a lot of
manual interaction exists, and some steps in the analysis
rely on researcher’s knowledge and experience how to use
or interpret results. Nevertheless, the following section aims
at giving an overview of utilized inductive and deductive
approaches.
3.1 Inductive analysis
Inductive analysis involves procedures aiming at recon-
structing or deriving frames from a set of given texts by find-
ing similar features that in Section 2 are defined as framing
devices. Usually there is no prior information given except
of criteria for news articles preselection.
3.1.1 Most frequent words
The term ”salient word”in the context of content analysis,
usually describes words with a high frequency in the text.
In framing analysis, originated from content analysis, some
approaches suggest that as long as a word reminds frequent,
it is a key word and influences on content and perception
of the presented there information. The described methods
are based on this words’ property, but apply various post
processing operations.
PCA on cosine similarity matrix
One of the first scholars who suggested and performed computer-
assisted framing analysis was [29], and he called this ap-
proach frame mapping. The scholar wanted to analyze the
frames of 2 competing group of stakeholders, and how they
are covered in the news and which terms and word choice are
associated with them. The suggested method mainly relies
in frame derivation based on clustering. The VBPro soft-
ware family was used in the analysis. Before the analysis,
several preprocessing steps were applied. Stopwords were
removed before the word list with corresponding frequen-
cies was obtained for each document. The most frequent
words were selected as candidates for frame terms, their
number was based on expertise of a researcher. Then, terms
with the same root were combined into one ”word term”.
This transformation is similar to stemming, but addition-
ally, these ”word terms”included synonyms from a manually
developed dictionary. If a term had several meanings and
some of them are applied to a word within a text, then this
word was removed, or manually tagged via VBPro to distin-
guish the meanings. Because the aim of the research is to
identify specific terms that co-occur with each stakeholders,
the researches marked manually each document by adding a
new word with a special symbol into the word list . The last
step was to compute the cosine similarity matrix between all
pairs of documents. If terms always co-occurred, a matrix
value resulted in 1, and 0 otherwise.
An eigenvalue extraction aimed at determining the most
frequent pattern of co-occurrences between words. The re-
sult of this analysis steps was a list of words and eigenvectors
associated with it, and applying multidimensional space the
values can be plotted on the concept map. The words that
co-occur together, are plotted closer to each other. The large
quantity of terms led to a very dense plot, and therefore,
the words were clustered using a agglomerative hierarchical
clustering method with cosine similarity as a distance met-
ric. On each iteration the algorithm grouped pairs of similar
objects until it reaches one unite cluster. The words, which
formed the upper-level hierarchies, named obtained frames.
Figure 1: Results of inductive frame derivation conducted
by hierarchical clustering [29]
The authors suggested labels inserted into the texts will be
described by regular words, if these labels are clustered on
the early stages. Figure 1 shows the obtained clusters, which
refer to frames, words that are contained in the clusters, and
eigenvector values used later to plot frames in 3D space.
The inserted labels also had own clusters and represent-
ing eigenvector values. Figure 2 shows two resources, rep-
resented by Property-Owner Advocates and Conservation
Advocates, and frames which proximity to resources depicts
how each resource is framed.
Figure 2: Derived frames and their comparison to the stud-
ied resources [29]
The obtained results present frames more as agenda-setting,
describing each resource with respect to the topics that each
source or actor of the discussion chose to support his po-
sition. However, the results do not concentrate of lexical
choice of the frames and if a frame has a positive, negative
or neutral influence on perception of the topic or actor.
PCA on covariance matrix. [12] conducted a similar computer-
based analysis on the biotechnology and GMO products.
With a goal a frame reconstruction, their comparison, and
analysis of lexical choice. The main idea of use the most
frequent words came from [29], and for the analysis they
used WordStat and SPSS software. They also used manual
selection of top most frequent words with respect to their
meaningfulness, interpretability, and absence of ambiguity,
and the number was limited to 130 words. Having list of fre-
quencies of preselected words pro document, the researches
computed 2 covariance matrices for each newspaper. Princi-
pal component analysis with a varimax rotation was applied
as a tool, which searched for the relationship between words
and also showed the latent structure. Eigenvalues higher
than 1 resulted in 8 meaningful frames both for Missouri
and Northern California newspapers. If a word had a load-
ing factor greater or equal to 0.3, then this word was a part
of a cluster.
Figure 3: Frames formed by PCA on the most frequent
words between documents [12]
The researches named the frames manually, and compared
the word choice between similar frames in the newspapers.
They stated that the agriculture topic is framed differently
by newspapers from two states, and as seen in the Figure 3
some differences exist (6 out of 8 frames are the same), but
it is hard to discuss how a specific word choice changed a
perspective to the issue. Most of the words are neutral nouns
that describe a frame more as an agenda-setting. The hints
of the word valence can be given by label ”Frankenfoods”,
which refers to GMO plants as ”mutant food”, but the term
lacks of explanation by sentiments, and therefore, obtain
meaning only with help of manual qualitative analysis.
Word co-occurrence
Self-organizing map. Yan Tian and Concetta M. Stewart
conducted news framing analysis for SARS crisis aiming at
finding frames that CNN and BBC applied to cover the topic
[41]. The CatPac (Category Package) software [48], which
was used as an analysis tool, is based on a self-organizing
map — an unsupervised artificial neural network, and aims
at finding semantic relationships between concepts (words)
that are represented as neurons. One concept can have the
same meaning for several words. The similarity of concepts
is based on the word patterns used in the text.
As a preprocessing step, the researches constructed one
list of words represented all articles per one newspaper, the
program selects the range of the most frequent words, re-
moves stop-words and verbs, and the final list of words was
corrected by a researcher. The neural network needs to be
trained with this word list: a sliding window of a selected size
runs over a text, and if two terms co-occur together, a weight
of their similarity is increased w.r.t. learning rate, activation
values for each node (word), and previously obtained weight
[47]. The result is a matrix with trained weights, and in
order to obtain frames a Ward’s hierarchical clustering is
applied. The results yielded two dendogram, where clusters
were named manually.
Figure 4: Result of inductive analysis with Self-organizing
map
The results in Figure 4 depict obtained frames and their
constituents. As already seen in the precious approaches,
extracted examples of word choice represent keywords. They
rather show agenda-setting structure, introducing sub-topics
of the crisis as various perspectives, than answer the question
how of to think about the issue, and what is the difference
in interpretation of the topic.
Semantic network. [23] conducted the analysis on impact
of artificial sweeteners in the media with assistance of the
TextSTAT and the Pajek programs. The authors suggested
a construction of a semantic maps to reveal hidden word
meaning, which they called implicit frames. Texts were rep-
resented as list of 100 most frequent words, and to draw a
similarity measure between words, the scholars calculated
cosine similarity matrices for each period they where inter-
ested in. In order to find constituents of a frame, a threshold
was derived from mean cosine value of lower triangle of the
obtained matrix. Multidimensional scaling can be applied
to reduce dimensionality, but the authors plotted the matrix
as a network, where nodes were words, number of edges —
semantic relations to other words, and length of an edge —
the similarity measure of the words. Spatial proximity of
terms led to clusters or frames depicted in Figure 5.
Based on the Figure 5 it is hard to derive meaningful
frames. It depicts the relation and the obtained results could
Figure 5: An example of semantic network used to form
frames around term ”artificial sweetener” [23]
tell how the words are referred together, but neither the
number of connections, nor their length does not give objec-
tive value how to read it. Moreover, words used for the nodes
describe agenda-setting of this problem in the newspapers,
not the how this information can be interpreted.
3.1.2 Keyword extraction
[42] gave an overview of linguistic instruments for frame
construction with WordSmith [39]. They stated that us-
ing keywords follows the Entman’s description of a frame
of salience of words. ”Keyness” was based on log-likelihood
that calculates the number of occurrences of a given word on
the corpus with respect to a reference corpus, representing
the language of the text [15]. This approach is more sophis-
ticated using just a simple list of most frequent words, and
takes into account context in which the word choice is used.
Nevertheless, keywords were not said to represent a frame,
but helped to establish the key topic of it and central mean-
ing. Word concordance extracted the words on the left and
right from the word, therefore, broadening the context, in
which the keyword was used, and revealing a candidate for
a frame device. Frame contraction itself was based on qual-
itative methods, which formed frame packages suggested by
[45]
Performing a substantial part of the analysis manually
refers this research to the semi-automated approach, but
subjects to statistically confirmed selection of framing de-
vices. Word choice is based on the predefined frame compo-
nents, but a frame coding itself and determination of word
choice of each frame depends on manual coding.
3.1.3 Centering Resonance Analysis
[34] performed a comparative analysis of terrorism cov-
erage in the UK and USA. The scholars used Centering
Resonance Analysis (CRA) [11] on the whole set of docu-
ments. CRA finds the most important words with respect
to their frequency and influential positions. CRA creates a
network of objects and subjects of the text, and therefore,
selects nouns and noun phrases as nodes in this network.
Other parts of the speech, e.g. verbs, are excluded from
main components, but are used to link different nodes. The
researches stated that ”nouns denote conceptual categories
that provide more salience discourse information than verbs”
[11]. A word has high value of betweenness centrality, which
is represented ”by the number of times the words were linked
in the text according to the rules above”. Relative influence
of a word denotes the average number of steps required to
get from a considered node to the other nodes in network,
and is calculated as
Ii
T
=
X
j<k
gjk(i)/gjk
[(N − 1)(N − 2)/2]
where gjk is the number of shortest paths connecting the
jth
and kth
words, gjk(i) is the number of those paths con-
necting word i, and N is the number of words in network.
An example of obtained results of comparison between
two newspapers is shown in Figure 6. Words of each news-
paper are grouped in a way that the more interrelated ones,
appeared closer to each other, yielded frames. Coefficient
of betweenness was used to measure similarity of obtained
frames and also words in a frame, but the researches did
not provide these numbers to compare the results. [21] con-
ducted similar analysis based on CRA, but their results did
not provide any additional input for results comparison.
Figure 6: An example comparison between similarity and
difference of framing terrorism between newspapers of two
countries [34]
The whole analysis is performed on a computer-based ap-
proach, but was evaluated by manual discourse analysis.
The results shows the coverage of topics within the issue.
Some words can even give an overview of news coverage va-
lence, but because some parts of the speech were omitted
and adjectives did not receive enough influence in the net-
work, it is hard to derive specific framing devices that would
describe a frame.
3.1.4 Latent Semantic Analysis
[40] based the detection of frames ”he” and ”she” frame
on Latent Semantic Analysis (LSA). LSA [14] is a method
for Information Retrieval and also frequently used is social
sciences applications. Its key feature is a possibility of de-
tecting the words that represent similar concepts, synonyms,
or semantically related concepts. It represents texts a ma-
trix of words in texts with either word frequencies or TF-
IDF as values. The main objective was to determine how
men and women are represented in the media news in 1996-
1997, which terms are chosen to describe them. The results
are shown in Figure 7, where semantically related terms are
shown in conjunction with cosine similarity.
The scholars wanted to test the valence of the context
around two pronouns, and the expectation that pronoun ”he”
will be more positive than ”she”was confirmed by univariate
analysis of variance (ANOVA), where they used valence as
dependent variable and gender as independent. Also with
this analysis they proved the hypothesis, that pronoun ”she”
will have more gender determined labels, where indeed in the
Figure 7: Results of semantically related words and cosine
similarity values for the defines frames [40]
Figure 7 we see that such labels as ”mother”, ”woman”, ”girl”
have high similarity values and can represent one entity and
its characteristics. All in all, though context of pronoun ”he”
was covered 9 times more frequently than ”she”, the analysis
allowed to compare word instances w.r.t. their coverage.
The interpretation of word valence is based on a qualitative
analysis of the obtained word choice, and not automatically
presented to the users.
3.1.5 Keyword-weight model
The goal of [35] is to solve the problem of media bias and
also to overcome a slant of a frame to narrow a topic to a spe-
cific perception. They proposed a NewsCube system, that
utilized (1) keyword-weight model, and (2) framing cycle-
aware clustering. Keyword-weight model is discussed as an
optimal solution between simple keyword extraction by fre-
quency and complex syntactic and semantic parsing, that
often yields ambiguous results. The architecture of the sys-
tem is depicted in the Figure 8.
Figure 8: NewsCube architecture [35]
The importance of words depends on the news pyramid
structure: the most prominent words and salient informa-
tion will appear in the head, then sub-head, lead, and only
then in main text. Therefore, suggested news structure-
based extraction count not only word frequencies but also
weight them considering the location of a word. The results
Figure 9: ASSIST software architecture [6]
are normalized with respect to the length of the structural
element where the word was found.
For the frame derivation they used the concept of covering
aspect over time in frame cycle. It comes from the property
of news issue, when just appeared, all news media uses the
same source of information covering the basic facts, then ex-
tending it with additional sources. The additional sources
highly depend on the supported bias. The framing cycle-
aware clustering focuses on differentiation the head group of
common articles from the tail group by calculation common-
ness and uncommonness between articles and then splitting
them in the 2D space. Measures of (un)commonness are
based on cosine similarity. Obtained values depend on key-
word weight and the corresponding commonness and un-
commonness of keywords within an article, that describe
how often keywords appear in a considered set of articles.
With respect to the research question keyword-weight model
represents an improved previously observed solutions on sim-
ple most frequent word extraction in Sections 3.1.1. On the
other hand, the approach simplifies the model to syntac-
tic structure, avoiding extraction of semantic information
rather utilizing it as cosine similarity. It allows readers to
have more objective scope of news articles on an issue, but
does not explain or construct frames that change perspective
with a particular word choice.
3.1.6 Named entity recognition and sentiment anal-
ysis
[6] have suggested an ASSIST system approaching fram-
ing analysis problem. Their solution consists of 3 modules:
named entity extraction, term extraction, and sentiment
analysis, thus developing a text mining platform to support
framing analysis. The architecture of the ASSIST software
is represented in Figure 9.
Named entity recognition (NER) allows a user to identify
the main roles in the text. The authors have pointed out
that only roles and locations, typically used in NER, are
not useful for their task, and therefore they expanded the
entities’ types and recognized 26 categories. This resulted
in the better recognition of the main topics or frames in
the text and also semantic annotation of the words, but
also highly depended on the algorithm of a particular NER
implementation. Used in ASSIST BaLIE semi-supervised
NER [30] faced the problem of defining categories for the
same word, but belonging to different entities, e.g. name
and town.
Term extraction is similar to keyphrase extraction func-
tions. It focuses on the salient linguistic elements, that form
the specific topic or concept [19, 4]. These concepts can
also represent word choice and labeling that is selected by
the author of news article to describe an event or situation.
TerMine combined the ideas of term formation patterns and
statistical information of a term. The authors provided the
examples, where labels such as ”Big Brother” and ”DNA
database” were extracted from newspaper articles about ID
cards.
Sentiment analyzer was a HYSEAS software [36], and de-
termined the used words’ tone in the text, and resulted as
a summarized score per sentence. The module complied the
task of determining positive, negative, or neutral effect of
perception of an article, though, the module faced problems
of accuracy especially with misclassified neutral sentences.
The suggested system was developed to aim social scien-
tists to perform semi-automated framing analysis. The out-
puts of modules required further qualitative analysis, and
did not yield a aggregated result analyzing module or visu-
alization, rather provided three disjoint outcomes for further
processing.
3.2 Deductive analysis
Deductive analysis solves a task of determining specified
frames in a set of given texts. Methods either treat a frame
as a whole, or split it into framing devices and try to find
a frame by its parts. For the following approaches news ar-
ticles are already coded, mostly manually, and the task of
computer-assist deductive analysis is to increase the speed
and accuracy of frame search in the remaining scope of ar-
ticles.
3.2.1 Classification
Classification is a Machine Learning (ML) task of identify-
ing a category of observations based on given set with known
membership. In order to train a model, one needs a dataset
with given labels which specify class membership. Then,
roughly following the Knowledge Discovery in Databases
(KDD) pipeline [3], model features are constructed, resulting
a final dataset split into two parts for training and testing
purposes.
Logistic Regression
The task of the research of [8] is to measure how much of
each of 16 frames are present in smoking, immigration, and
same-sex marriage issues. Binary logistic regression clas-
sifiers were trained for each frame resulting all in all in
16 × 3 = 48 classifiers. Binary classifier decides if a frame is
present or not. As a features for the model the researches
suggested to use a binary code for each word in a framing
code book: if a word was found in an article it had value
1, and 0 otherwise. The prediction result is a real value,
and it measures the accuracy or probability of a frame in
a considered article. Classification results for smoking issue
are depicted in Figure 10.
The area under the curve (AUC) is a standard measure of
classification correctness, it ”expresses the probability that
the classifier will rank a positive document above a negative
Figure 10: Accuracy of frame search with logistic regression
for smoking issue[8]
Figure 11: Highest-impact words in classifi
ers for smoking issue[8]
document”[32]. The researches used it to prove that logistic
regression in the approach performed better than a random
classifier, and showed that some devices have more promi-
nent words and are easier distinguished than the others.
The most significant words of the best detected frames can
be specified. In order to do this it is suggested to calculate
a product of all learned coefficients for a feature (word) and
feature’s average value in the testing set. This value favors
frequent terms and lowers influence of infrequent ones. Fig-
ure 11 depicts an example of tree most represented frames
and their framing devices for smoking issue.
The results confirms the importance well defined frame
devices and the code book. The number in brackets in Fig-
ure 10 shows the prevalence of each frame in the texts, which
is used to depict frames in the current order, and there is no
obvious dependency between frame prediction and its preva-
lence. Therefore, the accuracy of classifier increases, if the
word choice must correctly represent a frame.
Ensemble of logistic regressions
[32] performed deductive analysis as a classification task
based on a ensemble of logistic regression functions. They
represented each article as a bag-of-word model and calcu-
lated TF-IDF scores for each term. Instead of a TF compo-
nent they applied sub-linear frequency scaling of 1+log(TF)
and used l2 normalization for IDF weights by adding 1. Each
frame had 3 to 4 framing devices, and each article in train-
ing and testing sets was manually labeled 1 or 0 if a fram-
ing device was found in a text or not. Factor analysis on
framing devices described, which ones are more influential,
and, as can be seen in Figure 12, framing devices C2 and
E2 were discarded due to low factor loading and coherence
with frames.
Manually annotated documents (5875) were used for train-
ing and testing purposes based on ten-fold cross validation.
To measure accuracy of the results the researches applied
two metrics: agreement between coders or classifiers in a en-
(a) Intercoder and classificator agreement
(b) AUC for ROC automated classification prediction
Figure 12: Results of frame identification with human cod-
ing, single logistic classifier and ensemble of classifiers for
each of identifier questions[32]
semble and AUC. The results for both metrics are depicted
in Figures 12a and 12b correspondingly.
The classificator ensemble showed better results compared
to a single classifier, and also comparison to a small manually
annotated data set (randomly chosen coders annotated 159
articles multiple times, approximately, 3% of overall number,
to calculate inter-coder agreement) yield somewhat better
prediction in 7 of of 11 framing devices. Nonetheless, some
framing devices are better classified than others. AUC in-
dicated the ensemble is more accurate in prediction by 20%
points in average.
The approach revealed accurate prediction results for some
framing devices, and poor for the others. The scholars de-
fined it as a problem of ambiguous interpretation of formu-
lated frames and complex message characteristics. Compre-
hensive word choice and explained conceptions would help
to improve the results.
Similar measures applied [9], and discoursed performances
of holistic approach when frame is coded in a general and
indicator-based approach, which is frequently used for man-
ual deductive coding and consisted of searching a frame by
determining presence or absence of its attributes. Addition-
ally to AUC and Krippendorff’s Alpha (KA) for inter-coder
agreements, they stated that accuracy (AC) of coincidence
between manual and computer-assist coding must be used.
All three values are used together and evaluated the correct-
ness of a frame found in a text.
The results showed that after representing documents with
TF-IDF in classification task the holistic approach performed
better than the indicator-based one. Moreover, with some-
what drop of accuracy supervised machine learning (SML)
algorithms can be noticed in a set of documents if it orig-
inated another resource than the training set. The exper-
iment showed results ranged from 0.79 to 0.96 for generic
frames such as conflict, economic consequences, morality,
and human interest, on which the scholars concluded that
SML is suitable for issue specific deductive analysis.
Support Vector Machines
[18] attempted to identify different word choice between four
news outlets. For the analysis they removed stop words, ap-
plied porter stemmer, and extracted unigrams, bigrams, and
trigrams of words to represent texts as bag-of-word model
based on TF-IDF. To find similar articles or mates, they
used the Best Reciprocal Hit (BRH) algorithm that works
similar to bioinformatics algorithms that identify similar
genes. Cosine similarity yielded the resulting vector space
model, and allowed to extract top n nearest-neighbors for
each document. The scholars also applied labels of article
source for the training data.
A linear SVM classifier was trained and tested with ten-
fold cross validation, and the performance was measured by
the break-even-point (BEP) ”which is a hypothetical point
where precision (ratio of positive documents among retrieved
ones) and recall (ratio of retrieved positive documents among
all positive documents) meet when varying the threshold”[18].
One linear classifier was used for one pair of news outlets.
The researches stated, that BEP reflects a measure of sep-
arability between news outlets’ lexical choice. Figure 13
depicts BEP values with corresponding 2D MDS represen-
tation and example of most influential words with highest
TF-IDF score for two news outlets.
Figure 13: BEP metric used to compare word choice of news
outlets[18]
Though the result yielded separable clusters of most spe-
cific words for each outlet, the approach is rather straight-
forward and applied SVM, which generally performs well for
text classification problem[24]. Moreover, the problem ad-
dressed as comparison of words choice between news outlets,
but not search of frames within them.
Supervised Hierarchical Latent Dirichlet Allocation
Usually topic modeling (LDA) is utilized as an approach for
inductive framing analysis, but [31] suggested his Supervised
Hierarchical Latent Dirichlet Allocation (SHLDA) method
based on LDA that aims at revealing not only agenda-setting
constituent, but also how the issue is framed and how people
speak about it. The author describes frame as a second level
of agenda-setting, containing the latent meaning of the text.
LDA is also known as topic modeling technique and is used
to assign words from texts to abstract
SHLDA is a supervised method that reveals topic hierar-
chies and represent a frame with the most probable words.
Each document is assigned to response value (label or nu-
Figure 14: Results of SHLDA algorithm: formed frames and
their development and polarization over time by each polit-
ical party [31]
Figure 15: Defined polarity of words derived by SHLDA [31]
meric value) that represents the author’s perspective, e.g.
sentiment or ideology. A text is represented as a bag of sen-
tences and a sentence is a bag of words. Each word then is in
iterative way assigned to a frame it represents, and output
of the model is ”a hierarchy of topics which is informed by
a label”[31].
Figure 14 depicts the result of the analysis. The lower lev-
els of topic hierarchy were more specific about word choice
and have two political parties framed the issue according
to their interest. The concept of the approach is similar to
previously described research with SVM, when the analysis
lies between deductive and inductive methods: on the one
hand, a label or frame is pregiven, and we do not need to
derive it, but on the other hand, framing devices of a frame
are unknown, and the task is to find them. Apart from
topic hierarchy, the model provides for each word a lexical
regression parameter, where highest values show positive as-
sociation of a word, and lowest values — negative. A lexical
regression parameter allows to analyze results not only on a
topic level, but specify valence of each word (Figure 15).
3.2.2 Hierarchical clustering
[27] for their deductive analysis for biotechnology issue
proposed a clustering method. They described biotechnol-
ogy frame with framing devices suggested by Entman: prob-
lem definition, casual attribution, moral evaluation, and treat-
ment. Manual inductive analysis relied on existing code-
books for this topic, and filled framing devices’ values with
text elements obtained from set of texts. It resulted to a
specific code-book for the particular research.
The idea of the suggested deductive method is to man-
ually code the presence/absence of each attribute element
for each article, use intercoder agreement for model fea-
tures, and then cluster the elements in Ward hierarchical
way. Clustering of the small frame elements revealed hid-
den connections between them, and enabled to derive differ-
ent frames for various time periods, determining in parallel
significance of each frame.
Though, all coding for frame’s elements the authors per-
formed manually, as well as the meaningful names for the
obtained frames from clusters, the approach allowed to ob-
tain 3 clusters or 3 frames and statistically supported by
heterogeneity measures.
3.2.3 Homogeneity analysis
[45] performed framing analysis on the topic of refugees
in Belgium, where they wanted to find and measure pres-
ence of derived victim and intruder frames in two newspa-
pers. The author suggested a qualitative method of frame
derivation that he formalized later in [46], and applied it in
inductive analysis. The result of it is a code-book with spec-
ified framing and reasoning devices and particular values for
each frame. A part of the code-book is shown in Figure
16, where numbers in brackets represent framing devices for
victim-frame and letters — for intruder-frame.
Figure 16: A sample frame matrix with framing devices [45]
The coders needed to indicate whether framing devices ap-
peared or not. 2-3 coders worked on these articles, therefore,
Cohen’s kappa was applied to calculate intercoder reliability
of each device. Devices with values less than 0.6 were dis-
carded. To determine which devices had greater influence,
the scholars relied on homogeneity analysis by means of al-
tering squares (HOMALS), which has a similar meaning as
MDS and projects values onto lower dimensional space pre-
serving degree of significance of each value. Framing devices
plotted in two-dimensional space close to each other indi-
cated similarity of articles. Figure 17 depicts the result of
HOMALS analysis. Clusters formed by framing devices are
clearly separable: intruder-frame devices occupied the left
side of the plot, whereas victim-frame — the right side.
The authors described the separation into upper and lower
part in the Figure 17 as distinction between journalists’ per-
spective: bottom part considered to have in-group attitude
using pronouns such as we/our while talking about the issue,
whereas the top part have out-group position, addressing the
Figure 17: Homogeneity analysis of framing devices between
intruder-frame (left) and victim-frame (right) [45]
Figure 18: Average scores for intruder-frame and victim-
frame representing the difference between coverage in
Flemish-language (orange) and French-language (green)
newspapers [45]
problem with pronouns they/their.
For further step the scholars analyzed the devices that
we positioned in the center of the plot. Qualitatively they
derived that these devices are used interchangeably within
articles, and therefore, could not be used to clearly separated
into one of the clusters. In order to measure ”how much” of
each frame is detected in an article, they suggested to use
Dimension 1 to scale influence of each particular framing
device and sum the influence into resulting index. Conse-
quently, each article had two indices: for intruder-frame and
victim frame. Being grouped by news outlet, these values are
depicted in Figure 18, where a green half-circle represents
French-language newspapers and an orange one — Flemish
ones.
Results show that 50% of frames were shared between
Flemish-language and French-language newspapers, and if
French-language newspapers tend to refer to the victim-
frame, Flemish-language outlets use usually both.
Figure 19: Framing devices of Israel’s position in Hamas
conflict [44]
Important criteria of word choice is to cover exclusively
the perspective of each particular frame. The paper desig-
nated that framing devices being well-formulated allow to
compare word choice in a quantitative way. For example,
apart from newspaper comparison suggested index allows to
analyze how each frame was referred to during some time
period.
3.2.4 Semantic Network Analysis
[44] conducted deductive analysis based on Semantic Net-
work Analysis (SNA). They represented sentences as seman-
tic statements, forming a network. The core object is a
predicate representing an event. An event has a subject
(actor of the action), object (the actor at whom the action
was addressed), and the source (represents the origin of this
event or statement). The sentences are parsed with SNA
into semantic constituents, and having the overall semantic
network of words and their roles, analysis of the text looks
like search on the network. Figure 19 depicts an example of
the network.
Clearly, framing devices also consist of subject, predicate,
and object, therefore, the search for the frame will be done
as a rule-based search for the elements of framing devices.
Based on this, the overall structure of the analysis comprises
(1) syntactic analysis with parsing the text, (2) role extrac-
tion for the words in a sentence, (3) determining frames by
relation extraction in a rule-based way from the obtained
network that specify framing devices.
For the evaluation the authors coded 100 testing sentences
manually, and to discuss the correspondence of manual and
automated coding used inter-coding reliability values such as
Cohen’s kappa and Krippendorff’s alpha in conjunction with
F1 score. Because the values of inter-coding reliability are
not strongly defined, the inference of this comparison has
no solid basis, but, nevertheless, the authors believe that
results are quite good based on the F1 score ranged between
0.71 and 0.83.
4. DISCUSSION
Originally framing analysis was a part of content analysis.
Researches from social sciences manually conducted it based
on the qualitative methods for text analysis, but recently
quantitative methods has become more popular instrument
[26]. Nevertheless, some researches argue about the quality
of the quantitative analyses and point on the weakness of
current text analysis methods in terms of distinguishing the
hidden meaning between lines.
Methods for computer-assist framing analysis exist both
for inductive and deductive analyses. Their summary and
additional information we present in Appendices A and B.
Some considered inductive approaches are still mostly based
on agenda-setting properties of a frame, rather than on fram-
ing itself. Framing attributes, being usually nouns, describe
what to think about (topic), but not how to think about it
(interpretation).
Frames are defines as salient piece of information, and this
significance comes from how frequently media or politicians
refer to a specific concept. A simple interpretation of it lies
behind most frequent words in the text corpus, which tend
to be a common starting point for the inductive analysis
(see Section 3.1.1). On the one hand, this approach follows
idea of word salience, but on the other hand, it misses the
semantic connection between neighboring words.
Several approaches compared news articles based on word
choice and labeling of each news outlet. They extracted sig-
nificant lexical elements based on syntactical properties of
texts (Section 3.1.3), semantic (Section 3.1.4), or both (Sec-
tions 3.1.5 and 3.1.6). All extracted information designated
the difference of the word choice, but demanded extra quali-
tative interpretation of the words’ polarity and interrelation.
Moreover, the analyses did not intend to find how a percep-
tion of a same entity or concept changes depending on the
word choice, rather searching for non-intersecting group of
words.
Frame construction regarding to word choice and labeling
should have the following requirements:
1. identify the main elements of a text, which could be
answers to the 5W1H questions (see Section 2) or other
interesting terms of a text;
2. distinguish and separate different frames based on the
word choice of each category:
3. find accompanying words that form frame perception,
i.e. sentiments.
Requirement 1 can be fulfilled with keyphrase extraction
or Named Entity Recognition techniques, while for require-
ment 2 a specific similarity measure should be applied to
identify semantically related words, and requirement 3 can
be addressed by sentiment analysis.
Deductive analysis approaches rely on classification meth-
ods (Section 3.2.1), clustering (Section 3.2.2), statistical (Sec-
tion 3.2.3), or rule-based (Section 3.2.4). All methods de-
pend on various properties of already derived frames and
how they represent texts. Texts usually are represented as a
table with attributes. Attributes (or features) are coded in a
binary way or with TF-IDF. [37] suggested several features
for text processing task, especially classification, that could
be also tried in framing deductive analysis.
A coded set of articles remains to be the most utilized
approach in the deductive analysis. Especially prior coded
articles are required for a classification task. Typically, ar-
ticles are coded manually, and intercoder agreement is used
for classes results to good performance. A significant prop-
erty of frames is high quality of word choice that exclusively
represents framing devices. But a small amount of manu-
ally coded articles remains a problem for classification, as,
roughly saying, the bigger training data set is, the better
classification results are. To enable fully automated coding
the following steps are required: (1) formulating code-book’s
attributes as keywords and then search on them in the text,
(2) extract keywords from the text and calculate similarity
measures with code-book’s attributes.
Solution 1 could be address with concept search (C-Search)
[22], where the main idea is to use semantic search and if
failed then switch to syntactic search. It is Information re-
trieval (IR) technique, and evaluation showed the results
better than simple keyword search. For the solution 2 key-
word extraction techniques can be applied, i.e. [7]. Well-
coded frames regarding word choice of each frame attribute
are an obligatory requirement for any applied afterwards de-
ductive approach.
5. CONCLUSION
In this work we provided a comprehensive literature re-
view on the existing computer-assist framing analysis ap-
proaches, which include inductive and deductive analyses.
Considered approaches are semi-automated and usually in-
clude human interaction in a analysis pipeline. Moreover,
only few methods are focused on word choice and labeling,
and, to the best of our knowledge, none of them address the
problem of identification of word choice of a single concept.
Current work also suggested several ideas that could solve
the problem of framing by word choice and labeling for both
inductive and deductive analysis.
6. REFERENCES
[1] Honest reporting: Lack of context bias.
http://honestreporting.com/news-literacy-defining-
bias-lack-of-context/.
[2] Media bias in strategic word choice.
http://www.aim.org/on-target-blog/media-bias-in-
strategic-word-choice/.
[3] Sigkdd: The community for data mining, data science
and analytics. http://www.kdd.org/.
[4] Termine software.
http://www.nactem.ac.uk/software/termine/.
[5] D. S. J. Allen. Media bias: 8 types [a classic, kinda],
2015.
[6] S. Ananiadou, D. Weissenbacher, B. Rea, E. Pieri,
F. Vis, Y. Lin, R. Procter, and P. Halfpenny.
Supporting frame analysis using text mining. In 5 th
International Conference on e-Social Science, 2009.
[7] A. Bougouin, F. Boudin, and B. Daille. Topicrank:
Graph-based topic ranking for keyphrase extraction.
In International Joint Conference on Natural
Language Processing (IJCNLP), pages 543–551, 2013.
[8] A. E. Boydstun, D. Card, J. Gross, P. Resnick, and
N. A. Smith. Tracking the development of media
frames within and across policy issues. 2014.
[9] B. Burscher, D. Odijk, R. Vliegenthart, M. De Rijke,
and C. H. De Vreese. Teaching the computer to code
frames in news: Comparing two supervised machine
learning approaches to frame analysis. Communication
Methods and Measures, 8(3):190–206, 2014.
[10] D. Chong and J. N. Druckman. Framing theory.
Annu. Rev. Polit. Sci., 10:103–126, 2007.
[11] S. R. Corman, T. Kuhn, R. D. McPhee, and K. J.
Dooley. Studying complex discursive systems. Human
communication research, 28(2):157–206, 2002.
[12] C. E. Crawley. Localized debates of agricultural
biotechnology in community newspapers: A
quantitative content analysis of media frames and
sources. Science Communication, 28(3):314–346, 2007.
[13] P. D’Angelo and J. A. Kuypers. Doing news framing
analysis: Empirical and theoretical perspectives.
Routledge, 2010.
[14] S. T. Dumais. Latent semantic analysis. Annual review
of information science and technology, 38(1):188–230,
2004.
[15] T. Dunning. Accurate methods for the statistics of
surprise and coincidence. Computational linguistics,
19(1):61–74, 1993.
[16] R. M. Entman. Framing: Toward clarification of a
fractured paradigm. Journal of communication,
43(4):51–58, 1993.
[17] R. M. Entman. Projections of power: Framing news,
public opinion, and US foreign policy. University of
Chicago Press, 2004.
[18] B. Fortuna, C. Galleguillos, and N. Cristianini.
Detection of bias in media outlets with statistical
learning methods. Text Mining, page 27, 2009.
[19] K. Franzi, S. Ananiadou, and H. Mima. Automatic
recognition of multi-word terms. International Journal
of Digital Libraries, 3(2):117–132, 2000.
[20] e. G. Axtell, A.Bartley. Radford University Core
Handbook. Radford University.
[21] D. M. Garyantes and P. J. Murphy. Success or chaos?
framing and ideology in news coverage of the iraqi
national elections. International Communication
Gazette, 72(2):151–170, 2010.
[22] F. Giunchiglia, U. Kharkevich, and I. Zaihrayeu.
Concept search: Semantics enabled syntactic search.
2008.
[23] I. Hellsten, J. Dawson, and L. Leydesdorff. Implicit
media frames: Automated analysis of public debate on
artificial sweeteners. Public Understanding of Science,
19(5):590–608, 2010.
[24] T. Joachims. Text categorization with support vector
machines: Learning with many relevant features. In
European conference on machine learning, pages
137–142. Springer, 1998.
[25] D. Kahneman and A. Tversky. Choices, values, and
frames. American psychologist, 39(4):341, 1984.
[26] J. Matthes. What’s in a frame? a content analysis of
media framing studies in the world’s leading
communication journals, 1990-2005. Journalism &
Mass Communication Quarterly, 86(2):349–367, 2009.
[27] J. Matthes and M. Kohring. The content analysis of
media frames: Toward improving reliability and
validity. Journal of communication, 58(2):258–279,
2008.
[28] M. E. McCombs and D. L. Shaw. The agenda-setting
function of mass media. Public opinion quarterly,
36(2):176–187, 1972.
[29] M. M. Miller. Frame mapping and analysis of news
coverage of contentious issues. Social Science
Computer Review, 15(4):367–378, 1997.
[30] D. Nadeau, P. Turney, and S. Matwin. Unsupervised
named-entity recognition: Generating gazetteers and
resolving ambiguity. 2006.
[31] V.-A. Nguyen, J. L. Boyd-Graber, and P. Resnik.
Lexical and hierarchical topic regression. In Advances
in Neural Information Processing Systems, pages
1106–1114, 2013.
[32] D. Odijk, B. Burscher, R. Vliegenthart, and
M. De Rijke. Automatic thematic content analysis:
Finding frames in news. In International Conference
on Social Informatics, pages 333–345. Springer, 2013.
[33] Z. Pan and G. M. Kosicki. Framing analysis: An
approach to news discourse. Political communication,
10(1):55–75, 1993.
[34] Z. Papacharissi and M. de Fatima Oliveira. News
frames terrorism: A comparative analysis of frames
employed in terrorism coverage in us and uk
newspapers. The International Journal of
Press/Politics, 13(1):52–74, 2008.
[35] S. Park, S. Kang, S. Chung, and J. Song. Newscube:
delivering multiple aspects of news to mitigate media
bias. In Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems, pages 443–452.
ACM, 2009.
[36] S. Piao, Y. Tsuruoka, and S. Ananiadou. Hyseas: A
hybrid sentiment analysis system. In Proceedings of
the Fourth International Conference on
Interdisciplinary Social Sciences, 2009.
[37] P. Przybyla, N. T. Nguyen, M. Shardlow,
G. Kontonatsios, and S. Ananiadou. Nactem at
semeval-2016 task 1: Inferring sentence-level semantic
similarity from an ensemble of complementary lexical
and sentence-level features. Proceedings of SemEval,
pages 614–620, 2016.
[38] M. Recasens, C. Danescu-Niculescu-Mizil, and
D. Jurafsky. Linguistic models for analyzing and
detecting biased language. In ACL (1), pages
1650–1659, 2013.
[39] M. Scott. Wordsmith tools 6. Oxford: Oxford
University Press, 2011.
[40] M. G. Sendén, S. Sikström, and T. Lindholm. ”she”
and ”he” in news media messages: pronoun use reflects
gender biases in semantic contexts. Sex Roles,
72(1-2):40–49, 2015.
[41] Y. Tian and C. M. Stewart. Framing the sars crisis: A
computer-assisted text analysis of cnn and bbc online
news reports of sars. Asian Journal of
Communication, 15(3):289–301, 2005.
[42] M. Touri and N. Koteyko. Using corpus linguistic
software in the extraction of news frames: towards a
dynamic process of frame analysis in journalistic texts.
International Journal of Social Research Methodology,
18(6):601–616, 2015.
[43] G. Tuchman. Making news: A study in the
construction of reality. 1978.
[44] W. van Atteveldt, T. Sheafer, and S. Shenhav.
Automatically extracting frames from media content
using syntacting analysis. In Proceedings of the 5th
Annual ACM Web Science Conference, pages 423–430.
ACM, 2013.
[45] B. Van Gorp. Where is the frame? victims and
intruders in the belgian press coverage of the asylum
issue. European Journal of Communication,
20(4):484–507, 2005.
[46] B. Van Gorp. Strategies to take subjectivity out of
framing analysis. Doing news framing analysis:
Empirical and theoretical perspectives, pages 84–109,
2010.
[47] J. Woelfel. Artificial neurol networks in policy
research: A current assessment. Journal of
Communication, 43(1):63–80, 1993.
[48] J. Woelfel and N. Stoyanoff. Catpac: A neural
network for qualitative analysis of text. In annual
meeting of the Australian Marketing Association,
Melbourne, Australia, 1993.
Method name,
source
Information about the
data
Preprocessing Methods Output Résumé
PCA on cosine
similarity matrix
[28]
• Associated Press dispatches
• 12.07.1984 – 27.06.1995
• 1465 articles
• Keyword: wetlands
• Stop words and ambiguous words
manually removed
• Number of most frequent words
chosen by authors
• 1 document = 1 list
• Cosine similarity matrix of most
frequent terms co-occurrence
• PCA
• Hierarchical clustering on 3
eigenvector values
• Table with frames’ names and
corresponding framing devices
• Visualization of frames in 3D
space
• Semi-automated
• Agenda setting
PCA on covariance
matrix
[11]
• Lexis-Nexis database
• 1.01.1992 – 1.12.2004
• 1156 articles
• Keywords: GMO, agricultural
biotech*, etc.
• Stop words and ambiguous words
manually removed
• Number of most frequent words
chosen by authors
• 1 document = 1 list
• PCA of most frequent words with
varimax rotation,
• 8 most meaningful eigenvalues
selected
• Terms with loading ≥ 0.3 form a frame
• Table with grouped framing
devices
• Qualitative analysis to interpret
results and name frames
required
• Semi-automated
• Agenda setting
Self-organizing map
[17]
• CNN and BBC websites
• 1.03.2003 – 1.09.2003
• 730 articles
• Keywords: SARS
• Stop words and verbs removed
• 1 document = 1 list
• Top 40 of most frequent words are
manually ranked
• Self-organizing map -unsupervised
neural network – of most frequent
words
• Hierarchical clustering based on
Ward’s method
• Table with grouped framing
devices
• Qualitative analysis to interpret
results and name frames
required
• Semi-automated
• Agenda setting
Semantic network
[22]
• New York Times website
• 1980 – 2006
• 54 articles
• Keywords: artificial
sweetener, etc.
• Data normalization
• Stop words removed
• All document for 1 topic = 1 list
• Number of most frequent words
chosen by authors
• Cosine similarity matrix of most
frequent terms co-occurrence
• Elements ≥ threshold form a network
• Normalize similarity
• Graph visualization
• Frames are obtained by visual
interpretation
• Semi-automated
• Agenda setting
Keyword extraction
[41]
• Lexis-Nexis database
• 01.2010 – 08.2010
• 40 articles
• Keywords: Greece, economy,
debt, crisis, etc.
• Bag-of-words of an article
• Bag-of-words of all articles
• Log-likelihood ratio calculation • List of concordances of
keywords
• Full qualitative analysis required
• Semi-automated
• Agenda setting
• Word Choice +
labeling
Centering Resonance
Analysis
[33]
• Lexis-Nexis database
• 06.2006 – 06.2007
• 218 articles
• Keywords: terrorist attacks,
Iraq, Israel, Afghanistan
• Only noun-phrases left, other
words form connections
• Pronouns are dropped
• Stemming
• Central resonance analysis – frames
are based on words of the biggest
influence or centrality
• Network of shared/distinct
keywords between frames
• Qualitative analysis for result
interpretation required
• Semi-automated
• Agenda setting
• Word Choice
Latent Semantic
Analysis
[39]
• Reuters website
• 1996 – 1997
• Not specified
• All articles
• Stop-words removed • Latent Semantic Analysis • Ordered list of terms and their
similarity value compared to a
given word
• Qualitative interpretation
required
• Automated
• Framing ±
• Word Choice +
labeling
Keyword-weight
model
[34]
• “20 most publishing news
providers”
• 2007
• 406 articles
• “many important events”
• Bag-of-words
• Stop-words removed
• Structure of text employed
• News Structure-based Extraction
• Keyword-weight model
• Aspect-based Clustering
• A system that shows spectrum
of articles with minimized
framing bias
• Automated
• Framing
• Word choice
Named entity
recognition
[35]
No information given No information given • Named Entity Extraction
• Term Extraction
• Sentiment Analysis
• Output from 3 independent
systems
• Qualitative interpretation
required
• Semi-automated
• Framing
• Word Choice +
labeling
APPENDIX
A. SUMMARY TABLE OF COMPUTER-ASSIST INDUCTIVE FRAMING ANALYSIS METHODS
Method name,
source
Information about the
data
Preprocessing Methods Output Résumé
Logistic Regression
[7]
• Lexis-Nexis database
• 1990 – 2012
• 9502 articles
• Keywords: smoking,
immigration, same-sex
marriage
• Manual frame coding on frame level
• Features: presence or absence of each
word compared to code-vocabulary
• Classes: presence/absence of a frame
• 1 classifier per frame
• Logistic Regression • Obtained prediction is used as a
measure of frame presence
• Visualization
• Trained model
• Semi-automated
• Agenda-setting
• Word choice and
labeling
Ensemble of logistic
regressions
[31]
• Dutch Lexis-Nexis database
• 1995 – 2011
• 5875 articles
• Not mentioned
• Manual frame coding on attribute level
• Features: Bag-of-words + TF-IDF
score
• Classes: presence/absence of an
attribute
• 1 classifier per attribute
• Factor analysis
• Ensemble of logistic regressions
• Obtained prediction is used as a
measure of frame presence
• Table of comparison between
predictions
• Trained model
• Semi-automated
• Framing
• Word choice + possibly
labeling
Support Vector
Machines
[17]
• websites of AJ, CNN, DN,
and IHT
• 31.03.2005 – 14.04.2006
• 21552 articles, 675 left after
preprocessing
• Not mentioned
• Porter stemmer
• Features: Bag-of-words + TF-IDF
score
• Classes: newspapers’ names
• Linear Support Vector Machines • Table with BEP measure
between pair of news outlets
• 2D visualization of MDS
representation
• List of most different word
choice
• Trained model
• Automated
• Framing
• Word choice + labeling
Supervised
Hierarchical Latent
Dirichlet Allocation
[30]
• GovTrack
• 109th
US Congress
• 5201 + 3060 turns
• Not mentioned
• Classes: derived label classes from
prior knowledge
• Supervised Hierarchical Latent
Dirichlet Allocation
• Visualized hierarchy of topics
based around given labels
• Automated
• Framing
• Word choice + labeling
Hierarchical clustering
[25]
• Lexis-Nexis database
• 1992 – 2001
• 1000 articles
• Keywords: biotech, genetic,
genome, DNA
• Manual frame coding on attribute level
• Features: intercoder agreement
• Ward hierarchical clustering • Table with attributes and values
showing membership of a
cluster/frame
• Qualitative analysis to interpret
results and name frames required
• Semi-automated
• Framing
• Word choice + labeling
Homogeneity analysis
[44]
• Flemish- and French-language
newspapers
• 20.10.2000 – 29.04.2001 and
1.09.2002 – 31.08.2003
• 1489 articles
• Keywords: refugees/asylum-
seekers
• Manual frame coding on attribute level
• Intercoder agreement calculation –
attributes < 0.6 are discarded
• Features: intercoder agreement
• Homogeneity analysis of framing
attributes
• Index of frame presence calculation
• Visualization of homogeneity
analysis
• Visualization of comparison
between newspapers
• Visualization of frames’
coverage dynamics
• Qualitative analysis to interpret
some results required
• Semi-automated
• Framing
• Word choice + labeling
Semantic Network
Analysis
[43]
• Lexis-Nexis database
• 20414 articles
• 27.12.2008 – 20.01.2009
• Keywords: gaza
• Parsed sentences of attributes and texts • Rule-based Semantic Network
Analysis
• Table with measures for each
attribute being found
• Trained model
• Semi-automated
• Framing
• Word choice + labeling
B. SUMMARY TABLE OF COMPUTER-ASSIST DEDUCTIVE FRAMING ANALYSIS METHODS

More Related Content

Similar to Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labeling

Media Effects Research Paper Guidelines Media effects, als.docx
Media Effects Research Paper Guidelines Media effects, als.docxMedia Effects Research Paper Guidelines Media effects, als.docx
Media Effects Research Paper Guidelines Media effects, als.docxARIV4
 
Research methodolgy and legal writing: Content Analysis
Research methodolgy and legal writing: Content AnalysisResearch methodolgy and legal writing: Content Analysis
Research methodolgy and legal writing: Content AnalysisNikhil kumar Tyagi
 
A ROLE-ORIENTED APPROACH TO PROBLEM-SOLVING
A ROLE-ORIENTED APPROACH TO PROBLEM-SOLVINGA ROLE-ORIENTED APPROACH TO PROBLEM-SOLVING
A ROLE-ORIENTED APPROACH TO PROBLEM-SOLVINGJoe Andelija
 
A ROLE-ORIENTED APPROACH TO PROBLEM-SOLVING
A ROLE-ORIENTED APPROACH TO PROBLEM-SOLVINGA ROLE-ORIENTED APPROACH TO PROBLEM-SOLVING
A ROLE-ORIENTED APPROACH TO PROBLEM-SOLVINGJennifer Daniel
 
A Tuning Machine For Cooperative Problem Solving
A Tuning Machine For Cooperative Problem SolvingA Tuning Machine For Cooperative Problem Solving
A Tuning Machine For Cooperative Problem SolvingNathan Mathis
 
Textual analysis Or Content Analysis ppt
Textual analysis Or Content Analysis pptTextual analysis Or Content Analysis ppt
Textual analysis Or Content Analysis pptHelinaWorku2
 
Unit 9. Critical Literacy in the 21st century 1: Media literacy and Framing
Unit 9. Critical Literacy in the 21st century 1: Media literacy and FramingUnit 9. Critical Literacy in the 21st century 1: Media literacy and Framing
Unit 9. Critical Literacy in the 21st century 1: Media literacy and FramingNadia Gabriela Dresscher
 
Qualitative Methods in International Relations - Chapters 5, 8, 10
Qualitative Methods in International Relations - Chapters 5, 8, 10Qualitative Methods in International Relations - Chapters 5, 8, 10
Qualitative Methods in International Relations - Chapters 5, 8, 10Bahria University, Islamabad
 
Content analysis
Content analysisContent analysis
Content analysisAtul Thakur
 
Content analysis
Content analysisContent analysis
Content analysisAtul Thakur
 
THE PURPOSE Knowledge After completing this assignment, .docx
THE PURPOSE Knowledge After completing this assignment, .docxTHE PURPOSE Knowledge After completing this assignment, .docx
THE PURPOSE Knowledge After completing this assignment, .docxoscars29
 
Analysing Multimodal Intertextuality An Illustrative Analysis
Analysing Multimodal Intertextuality  An Illustrative AnalysisAnalysing Multimodal Intertextuality  An Illustrative Analysis
Analysing Multimodal Intertextuality An Illustrative AnalysisEmma Burke
 
Unit 7. Theoritical & Conceptual Framework.pptx
Unit 7. Theoritical & Conceptual Framework.pptxUnit 7. Theoritical & Conceptual Framework.pptx
Unit 7. Theoritical & Conceptual Framework.pptxshakirRahman10
 
A Concept Analysis of Interdisciplinary Collaboration in Mental Healthcare by...
A Concept Analysis of Interdisciplinary Collaboration in Mental Healthcare by...A Concept Analysis of Interdisciplinary Collaboration in Mental Healthcare by...
A Concept Analysis of Interdisciplinary Collaboration in Mental Healthcare by...Crimsonpublisherscojnh
 
A riot is at bottom thelanguage of the unheard.-- Martin.docx
A riot is at bottom thelanguage of the unheard.-- Martin.docxA riot is at bottom thelanguage of the unheard.-- Martin.docx
A riot is at bottom thelanguage of the unheard.-- Martin.docxbartholomeocoombs
 
types of analyses of research
types of analyses of researchtypes of analyses of research
types of analyses of researchssuser1ee781
 
Communication design and theories of learning
Communication design and theories of learningCommunication design and theories of learning
Communication design and theories of learningUniversity of Waterloo
 

Similar to Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labeling (20)

Media Effects Research Paper Guidelines Media effects, als.docx
Media Effects Research Paper Guidelines Media effects, als.docxMedia Effects Research Paper Guidelines Media effects, als.docx
Media Effects Research Paper Guidelines Media effects, als.docx
 
Research methodolgy and legal writing: Content Analysis
Research methodolgy and legal writing: Content AnalysisResearch methodolgy and legal writing: Content Analysis
Research methodolgy and legal writing: Content Analysis
 
Theories for thesis
Theories for thesisTheories for thesis
Theories for thesis
 
A ROLE-ORIENTED APPROACH TO PROBLEM-SOLVING
A ROLE-ORIENTED APPROACH TO PROBLEM-SOLVINGA ROLE-ORIENTED APPROACH TO PROBLEM-SOLVING
A ROLE-ORIENTED APPROACH TO PROBLEM-SOLVING
 
A ROLE-ORIENTED APPROACH TO PROBLEM-SOLVING
A ROLE-ORIENTED APPROACH TO PROBLEM-SOLVINGA ROLE-ORIENTED APPROACH TO PROBLEM-SOLVING
A ROLE-ORIENTED APPROACH TO PROBLEM-SOLVING
 
A Tuning Machine For Cooperative Problem Solving
A Tuning Machine For Cooperative Problem SolvingA Tuning Machine For Cooperative Problem Solving
A Tuning Machine For Cooperative Problem Solving
 
Textual analysis Or Content Analysis ppt
Textual analysis Or Content Analysis pptTextual analysis Or Content Analysis ppt
Textual analysis Or Content Analysis ppt
 
Unit 9. Critical Literacy in the 21st century 1: Media literacy and Framing
Unit 9. Critical Literacy in the 21st century 1: Media literacy and FramingUnit 9. Critical Literacy in the 21st century 1: Media literacy and Framing
Unit 9. Critical Literacy in the 21st century 1: Media literacy and Framing
 
Qualitative Methods in International Relations - Chapters 5, 8, 10
Qualitative Methods in International Relations - Chapters 5, 8, 10Qualitative Methods in International Relations - Chapters 5, 8, 10
Qualitative Methods in International Relations - Chapters 5, 8, 10
 
How to Study The Media.pptx
How to Study The Media.pptxHow to Study The Media.pptx
How to Study The Media.pptx
 
Content analysis
Content analysisContent analysis
Content analysis
 
Content analysis
Content analysisContent analysis
Content analysis
 
THE PURPOSE Knowledge After completing this assignment, .docx
THE PURPOSE Knowledge After completing this assignment, .docxTHE PURPOSE Knowledge After completing this assignment, .docx
THE PURPOSE Knowledge After completing this assignment, .docx
 
Analysing Multimodal Intertextuality An Illustrative Analysis
Analysing Multimodal Intertextuality  An Illustrative AnalysisAnalysing Multimodal Intertextuality  An Illustrative Analysis
Analysing Multimodal Intertextuality An Illustrative Analysis
 
Unit 7. Theoritical & Conceptual Framework.pptx
Unit 7. Theoritical & Conceptual Framework.pptxUnit 7. Theoritical & Conceptual Framework.pptx
Unit 7. Theoritical & Conceptual Framework.pptx
 
A Concept Analysis of Interdisciplinary Collaboration in Mental Healthcare by...
A Concept Analysis of Interdisciplinary Collaboration in Mental Healthcare by...A Concept Analysis of Interdisciplinary Collaboration in Mental Healthcare by...
A Concept Analysis of Interdisciplinary Collaboration in Mental Healthcare by...
 
A riot is at bottom thelanguage of the unheard.-- Martin.docx
A riot is at bottom thelanguage of the unheard.-- Martin.docxA riot is at bottom thelanguage of the unheard.-- Martin.docx
A riot is at bottom thelanguage of the unheard.-- Martin.docx
 
types of analyses of research
types of analyses of researchtypes of analyses of research
types of analyses of research
 
Introduction to communication theory
Introduction to communication theory Introduction to communication theory
Introduction to communication theory
 
Communication design and theories of learning
Communication design and theories of learningCommunication design and theories of learning
Communication design and theories of learning
 

More from Anastasia Zhukova

What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...Anastasia Zhukova
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...Anastasia Zhukova
 
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Anastasia Zhukova
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Anastasia Zhukova
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingAnastasia Zhukova
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Anastasia Zhukova
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Anastasia Zhukova
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Anastasia Zhukova
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Anastasia Zhukova
 
XCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildXCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildAnastasia Zhukova
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsAnastasia Zhukova
 

More from Anastasia Zhukova (11)

What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
 
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and Labeling
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...
 
XCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildXCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the Wild
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
 

Recently uploaded

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 

Recently uploaded (20)

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 

Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labeling

  • 1. Putting News in a Perspective: Framing by Word Choice and Labeling Anastasia Zhukova University of Konstanz Universität str., 10 Konstanz, Germany anastasia.zhukova@uni-konstanz.de ABSTRACT While following the news, one can notice the same story can have different impact depending on which news agent tells it. One reason for this is how the facts are framed. Fram- ing is described by communication sciences as an instrument influencing on how people perceive, interpret and convey in- formation. It can be obtained by use of specific word choice and labeling that describe event or problem from a partic- ular perspective, e.g. positive or negative. In order to de- rive a frame, social sciences usually perform a manual qual- itative analysis, but recently a computer-assist quantitative approaches commence to be an essential way of conducting framing analysis. This work provides a literature review on the existing frame derivation methods based on problem of word choice and labeling. Keywords Frame analysis, news analysis, word choice, labeling 1. INTRODUCTION Consider reading news headlines reporting on the same event from July, 2015 incident, when Palestinians stoned a car with several police officers. The following two leave a contradictory impression: Irish Times: ”Palestinian protester shot dead in West Bank” and Reuters ”Israeli officer kills stone-throwing Palestinian youth in West Bank”[1]. Reuters’ headline provides subject, action, object, and reason of the performed action, Irish Times applies a label ”protester” to a youth and switched the focus of the story towards the ob- ject, resorting to passive voice and omitting the subject of the story. Word choice of ”shoot dead” compared to ”kill” may also refer to visual associations and change the percep- tion of the story. The overall impression can lead a reader to interpretation of the Irish Times in more negative way compared to Reuters. Different news framing is a reason. Framing is a conceptualization of the way how people or- ganize, perceive, and communicate information. It is an instrument of political science that communicates ideas and messages, and ”defines issues” [46]. The term framing was first introduced by Tuchman [43]. A frame is a framing en- tity and is defined as a strong communicative tool, that in a compact way unites a set of ideas that need to be transferred to people, including problems, judgments, actions, causes and solutions. In [16] Entman suggested a description of framing ”selecting of some aspects of a perceived reality and making them more salient in a communicating text, in such a way as to promote a particular problem definition, casual interpretation, moral evaluation, and/or treatment recom- mendation”. Usually a frame is a pattern of the most important part of messages that members of political parties, newspapers, or individuals convey, having an underlined interest. A frame should be a rather short and simple message, that is a re- flection a particular event or people [17], and is memorable and reproducible, which can cause a further spread, use and evolution. Politicians use frames to motivate people to act within a frame’s boundaries and its idea [10], affecting attitudes and behaviors. Frames convey messages with a symbolic el- ements, metaphors, utilizing stereotypes, word choice, and labels that all together consolidate idea in an entity that [46] calls a package. This package embeds cultural and indi- vidual perception, that reflects in person’s mind the most. Specifically chosen words have a great impact on people by referring to the well known associations, images from the previous experience and lexical tones. The definition of a frame varies in its interpretation, and consequently, social scientist define and apply several meth- ods for frame derivation. Regardless of specifics, all methods are either inductive or deductive. In inductive analysis a set of texts is analyzed in order to define frame as a piece of message that these texts or news convey[46]. The task is to find important or according to Entman ”salient”information either by frequency of influential words, or by words’ impact on the one’s text perception. Deductive analysis works with predefined frames and existing code-books which describe a frame, and determines evidence of a frame being found in a text. The main challenge for computer-assisted frame analysis is finding a frame and its elements, as frame analysis remains mostly qualitative. Therefore, the research questions of the paper are (1) how do scholars approach computer-assisted framing analysis, (2) what are the methods that focus anal- ysis on constructing or finding frames based on word choice and labeling. The paper is organized as follows: we start with giving an overview of forms of frame analysis and its the general prop- erties, then review the existing approaches and methods for finding a frame, and discuss current and possible solutions for framing w.r.t. word choice and labeling problem. 2. FRAME ANALYSIS Framing is a process of issue conceptualization and uses a frame as a tool. A frame a system of organized ideas and messages, which are called attributes or devices [33]. Each
  • 2. device consists of highly influential words, forming a struc- ture acts as a trigger on people, aiming at a specific reaction based on a particular background: cultural, symbolic and psychological. [16] discussed information as influential and important, or as he defined it — ”salient” — first, based on word frequency. If one wants to emphasize a piece of infor- mation, it needs to be repeated. Second, the use of familiar symbols emphasizes perception of the information and trig- gers fast interpretation of well-known concepts. All these methods activate a frame and make it prominent [46]. Framing attributes can be associated with a lens, provid- ing specific boundaries and perspective on a set of views [13].[33] suggested 4 categories of structures for framing de- vices: 1. syntactical structure — refers to inverted pyramid in news discourse and its structural elements, i.e. head- line, lead, main body, where words and phrases have influence in the inverted order to the location in the text — technical devices; 2. script structure — represents a sequence of activities and components that a single event can consist of, i.e. 5 W and 1 H questions: who, what, when, where, why, and how — framing devices; 3. thematic structure — defines an article as a theme or being a subtheme — framing device; 4. rhetorical structure — consists of word and stylistic choices, metaphors, exemplars, catchphrases, depic- tions, labels, citations, and visual images, which all together increase salience of a given point of view and substitute facts for their interpretation — rhetorical devices. The word choice aims at a predefined reaction and inter- pretation way, thus, influencing the decision making process [25]. Specific word choice changes the polarity or valence of the whole story, therefore, framing can introduce bias when operating with the subjective words and tones [38]. It can be done addressing people with specific terms and words that correspond to their background and previous experience. One concept can be represented by different words, and the selection of a particular word combination depends on the aim of a frame. Consider the examples: ”Heart-wrenching tales of hardship faced by people whose care is dependent on Medicaid” is very biased towards pity compared to ”In- formation on the lifestyles of Medicaid dependents”[2]. Well informed people tend to have a solid opinion and position in various questions and will distinguish difference between two examples, but the others can be influence-prone. Labeling of an opponent intends to devalue his opinion with a strong negative association [20]. The most popular labels refer to the ideas, political organizations, and activists and form a bias by label [5]. Applying a label influences on fast image construction, and not rarely leads to image distortion compared to the rest of the content. For exam- ple, when conservatives are called ”far right”, it leads to a negative perception of the whole story, but meanwhile, rad- icals might obtain more positive labels, and the picture will change correspondingly. Framing analysis can be classified as followed [26]: (1) framing and agenda-setting; (2) qualitative (text-based) or quantitative (number-based); (3) manual or computer-based frame coding; (4) inductive or deductive approaches. Framing and agenda setting. Frames, depending on the forms of the devices it consists of, represents either framing or agenda setting. The main difference lies in their func- tions: agenda-setting tells people what to think about, while framing influences how to think about[28]. In other words, agenda-setting provides a topic and and themes that news cover. Framing plays a role of a second-level of agenda- setting, and therefore, describes reasons of news elements’ salience and provide interpretation of news stories. Qualitative and quantitative analysis. A typical framing analysis is based on the qualitative analysis, where coders derive a frame out of the set of articles. A coder is a so- cial scientist who reads news articles, highlight the most significant information, find actors and actions, define the intonation, metaphors, and lexical choices in the text, and then group the obtained frame elements in the way of gen- eralizing and summarization [46]. The overall process tends to be slow and applicable only to a small number of articles. Unlike qualitative analysis, quantitative analysis describes statistical word properties and interrelation between words. A very basic approaches rely on word frequency, but the general idea is to convert text into numerical values and apply following transformations and calculations. Manual and computer-based frame coding. Advantages and disadvantages go along with both approaches for man- ual and computer-based frame coding. A lack of ”fully de- veloped operational procedures” with respect to ambigu- ous frame construction was named in [33] as a reason why computer-based approaches were impossible. Unfeasible in framing could be also a problem that includes metaphors and turns of speech, chosen as frame devices [45, 27], which include hidden meaning and require human interpretation. Nonetheless, doing manual framing analysis has its own drawbacks. Firstly, it is time consuming, and it is hard to analyze a huge scale of texts. Second, framing analysis can be biased and depends on the psychological and cul- tural background of a researcher. Thus, computer-assisted analysis tends to be more objective and reliable [41]. The analysis conducted with computer-based approach can be repeated by other researches, and the obtained results based on the algorithms applied several times will be the same, which is a concern to manually coded results. More- over, for computer-assist approach [10] states that it is im- portant to predefine a universe of words describing a frame: it simplifies word selection during framing and makes the vocabulary more specific to texts’ topic, thus, more likely to correctly find a frame, at least in deductive analysis. Inductive and deductive approaches. Framing analysis can be also described as two approaches: inductive and de- ductive. Inductive approach tries to reconstruct frames from a given set of texts. It aims at finding both a general con- veyed idea and its framing attributes. In deductive analysis a frame is predefined and the task is to search for the framing devices to confirm a presence of a frame[10]. 3. RELATED WORK Though framing analysis is typically a task performed manually by social scientists, for the past years the research into computer-assist approaches is growing. It is important
  • 3. to highlight a term ”assisted”, because currently a lot of manual interaction exists, and some steps in the analysis rely on researcher’s knowledge and experience how to use or interpret results. Nevertheless, the following section aims at giving an overview of utilized inductive and deductive approaches. 3.1 Inductive analysis Inductive analysis involves procedures aiming at recon- structing or deriving frames from a set of given texts by find- ing similar features that in Section 2 are defined as framing devices. Usually there is no prior information given except of criteria for news articles preselection. 3.1.1 Most frequent words The term ”salient word”in the context of content analysis, usually describes words with a high frequency in the text. In framing analysis, originated from content analysis, some approaches suggest that as long as a word reminds frequent, it is a key word and influences on content and perception of the presented there information. The described methods are based on this words’ property, but apply various post processing operations. PCA on cosine similarity matrix One of the first scholars who suggested and performed computer- assisted framing analysis was [29], and he called this ap- proach frame mapping. The scholar wanted to analyze the frames of 2 competing group of stakeholders, and how they are covered in the news and which terms and word choice are associated with them. The suggested method mainly relies in frame derivation based on clustering. The VBPro soft- ware family was used in the analysis. Before the analysis, several preprocessing steps were applied. Stopwords were removed before the word list with corresponding frequen- cies was obtained for each document. The most frequent words were selected as candidates for frame terms, their number was based on expertise of a researcher. Then, terms with the same root were combined into one ”word term”. This transformation is similar to stemming, but addition- ally, these ”word terms”included synonyms from a manually developed dictionary. If a term had several meanings and some of them are applied to a word within a text, then this word was removed, or manually tagged via VBPro to distin- guish the meanings. Because the aim of the research is to identify specific terms that co-occur with each stakeholders, the researches marked manually each document by adding a new word with a special symbol into the word list . The last step was to compute the cosine similarity matrix between all pairs of documents. If terms always co-occurred, a matrix value resulted in 1, and 0 otherwise. An eigenvalue extraction aimed at determining the most frequent pattern of co-occurrences between words. The re- sult of this analysis steps was a list of words and eigenvectors associated with it, and applying multidimensional space the values can be plotted on the concept map. The words that co-occur together, are plotted closer to each other. The large quantity of terms led to a very dense plot, and therefore, the words were clustered using a agglomerative hierarchical clustering method with cosine similarity as a distance met- ric. On each iteration the algorithm grouped pairs of similar objects until it reaches one unite cluster. The words, which formed the upper-level hierarchies, named obtained frames. Figure 1: Results of inductive frame derivation conducted by hierarchical clustering [29] The authors suggested labels inserted into the texts will be described by regular words, if these labels are clustered on the early stages. Figure 1 shows the obtained clusters, which refer to frames, words that are contained in the clusters, and eigenvector values used later to plot frames in 3D space. The inserted labels also had own clusters and represent- ing eigenvector values. Figure 2 shows two resources, rep- resented by Property-Owner Advocates and Conservation Advocates, and frames which proximity to resources depicts how each resource is framed. Figure 2: Derived frames and their comparison to the stud- ied resources [29] The obtained results present frames more as agenda-setting, describing each resource with respect to the topics that each source or actor of the discussion chose to support his po- sition. However, the results do not concentrate of lexical choice of the frames and if a frame has a positive, negative or neutral influence on perception of the topic or actor. PCA on covariance matrix. [12] conducted a similar computer- based analysis on the biotechnology and GMO products. With a goal a frame reconstruction, their comparison, and analysis of lexical choice. The main idea of use the most frequent words came from [29], and for the analysis they used WordStat and SPSS software. They also used manual
  • 4. selection of top most frequent words with respect to their meaningfulness, interpretability, and absence of ambiguity, and the number was limited to 130 words. Having list of fre- quencies of preselected words pro document, the researches computed 2 covariance matrices for each newspaper. Princi- pal component analysis with a varimax rotation was applied as a tool, which searched for the relationship between words and also showed the latent structure. Eigenvalues higher than 1 resulted in 8 meaningful frames both for Missouri and Northern California newspapers. If a word had a load- ing factor greater or equal to 0.3, then this word was a part of a cluster. Figure 3: Frames formed by PCA on the most frequent words between documents [12] The researches named the frames manually, and compared the word choice between similar frames in the newspapers. They stated that the agriculture topic is framed differently by newspapers from two states, and as seen in the Figure 3 some differences exist (6 out of 8 frames are the same), but it is hard to discuss how a specific word choice changed a perspective to the issue. Most of the words are neutral nouns that describe a frame more as an agenda-setting. The hints of the word valence can be given by label ”Frankenfoods”, which refers to GMO plants as ”mutant food”, but the term lacks of explanation by sentiments, and therefore, obtain meaning only with help of manual qualitative analysis. Word co-occurrence Self-organizing map. Yan Tian and Concetta M. Stewart conducted news framing analysis for SARS crisis aiming at finding frames that CNN and BBC applied to cover the topic [41]. The CatPac (Category Package) software [48], which was used as an analysis tool, is based on a self-organizing map — an unsupervised artificial neural network, and aims at finding semantic relationships between concepts (words) that are represented as neurons. One concept can have the same meaning for several words. The similarity of concepts is based on the word patterns used in the text. As a preprocessing step, the researches constructed one list of words represented all articles per one newspaper, the program selects the range of the most frequent words, re- moves stop-words and verbs, and the final list of words was corrected by a researcher. The neural network needs to be trained with this word list: a sliding window of a selected size runs over a text, and if two terms co-occur together, a weight of their similarity is increased w.r.t. learning rate, activation values for each node (word), and previously obtained weight [47]. The result is a matrix with trained weights, and in order to obtain frames a Ward’s hierarchical clustering is applied. The results yielded two dendogram, where clusters were named manually. Figure 4: Result of inductive analysis with Self-organizing map The results in Figure 4 depict obtained frames and their constituents. As already seen in the precious approaches, extracted examples of word choice represent keywords. They rather show agenda-setting structure, introducing sub-topics of the crisis as various perspectives, than answer the question how of to think about the issue, and what is the difference in interpretation of the topic. Semantic network. [23] conducted the analysis on impact of artificial sweeteners in the media with assistance of the TextSTAT and the Pajek programs. The authors suggested a construction of a semantic maps to reveal hidden word meaning, which they called implicit frames. Texts were rep- resented as list of 100 most frequent words, and to draw a similarity measure between words, the scholars calculated cosine similarity matrices for each period they where inter- ested in. In order to find constituents of a frame, a threshold was derived from mean cosine value of lower triangle of the obtained matrix. Multidimensional scaling can be applied to reduce dimensionality, but the authors plotted the matrix as a network, where nodes were words, number of edges — semantic relations to other words, and length of an edge — the similarity measure of the words. Spatial proximity of terms led to clusters or frames depicted in Figure 5. Based on the Figure 5 it is hard to derive meaningful frames. It depicts the relation and the obtained results could
  • 5. Figure 5: An example of semantic network used to form frames around term ”artificial sweetener” [23] tell how the words are referred together, but neither the number of connections, nor their length does not give objec- tive value how to read it. Moreover, words used for the nodes describe agenda-setting of this problem in the newspapers, not the how this information can be interpreted. 3.1.2 Keyword extraction [42] gave an overview of linguistic instruments for frame construction with WordSmith [39]. They stated that us- ing keywords follows the Entman’s description of a frame of salience of words. ”Keyness” was based on log-likelihood that calculates the number of occurrences of a given word on the corpus with respect to a reference corpus, representing the language of the text [15]. This approach is more sophis- ticated using just a simple list of most frequent words, and takes into account context in which the word choice is used. Nevertheless, keywords were not said to represent a frame, but helped to establish the key topic of it and central mean- ing. Word concordance extracted the words on the left and right from the word, therefore, broadening the context, in which the keyword was used, and revealing a candidate for a frame device. Frame contraction itself was based on qual- itative methods, which formed frame packages suggested by [45] Performing a substantial part of the analysis manually refers this research to the semi-automated approach, but subjects to statistically confirmed selection of framing de- vices. Word choice is based on the predefined frame compo- nents, but a frame coding itself and determination of word choice of each frame depends on manual coding. 3.1.3 Centering Resonance Analysis [34] performed a comparative analysis of terrorism cov- erage in the UK and USA. The scholars used Centering Resonance Analysis (CRA) [11] on the whole set of docu- ments. CRA finds the most important words with respect to their frequency and influential positions. CRA creates a network of objects and subjects of the text, and therefore, selects nouns and noun phrases as nodes in this network. Other parts of the speech, e.g. verbs, are excluded from main components, but are used to link different nodes. The researches stated that ”nouns denote conceptual categories that provide more salience discourse information than verbs” [11]. A word has high value of betweenness centrality, which is represented ”by the number of times the words were linked in the text according to the rules above”. Relative influence of a word denotes the average number of steps required to get from a considered node to the other nodes in network, and is calculated as Ii T = X j<k gjk(i)/gjk [(N − 1)(N − 2)/2] where gjk is the number of shortest paths connecting the jth and kth words, gjk(i) is the number of those paths con- necting word i, and N is the number of words in network. An example of obtained results of comparison between two newspapers is shown in Figure 6. Words of each news- paper are grouped in a way that the more interrelated ones, appeared closer to each other, yielded frames. Coefficient of betweenness was used to measure similarity of obtained frames and also words in a frame, but the researches did not provide these numbers to compare the results. [21] con- ducted similar analysis based on CRA, but their results did not provide any additional input for results comparison. Figure 6: An example comparison between similarity and difference of framing terrorism between newspapers of two countries [34] The whole analysis is performed on a computer-based ap- proach, but was evaluated by manual discourse analysis. The results shows the coverage of topics within the issue. Some words can even give an overview of news coverage va- lence, but because some parts of the speech were omitted and adjectives did not receive enough influence in the net- work, it is hard to derive specific framing devices that would describe a frame. 3.1.4 Latent Semantic Analysis [40] based the detection of frames ”he” and ”she” frame on Latent Semantic Analysis (LSA). LSA [14] is a method for Information Retrieval and also frequently used is social sciences applications. Its key feature is a possibility of de- tecting the words that represent similar concepts, synonyms, or semantically related concepts. It represents texts a ma- trix of words in texts with either word frequencies or TF- IDF as values. The main objective was to determine how men and women are represented in the media news in 1996- 1997, which terms are chosen to describe them. The results are shown in Figure 7, where semantically related terms are shown in conjunction with cosine similarity. The scholars wanted to test the valence of the context around two pronouns, and the expectation that pronoun ”he” will be more positive than ”she”was confirmed by univariate analysis of variance (ANOVA), where they used valence as dependent variable and gender as independent. Also with this analysis they proved the hypothesis, that pronoun ”she” will have more gender determined labels, where indeed in the
  • 6. Figure 7: Results of semantically related words and cosine similarity values for the defines frames [40] Figure 7 we see that such labels as ”mother”, ”woman”, ”girl” have high similarity values and can represent one entity and its characteristics. All in all, though context of pronoun ”he” was covered 9 times more frequently than ”she”, the analysis allowed to compare word instances w.r.t. their coverage. The interpretation of word valence is based on a qualitative analysis of the obtained word choice, and not automatically presented to the users. 3.1.5 Keyword-weight model The goal of [35] is to solve the problem of media bias and also to overcome a slant of a frame to narrow a topic to a spe- cific perception. They proposed a NewsCube system, that utilized (1) keyword-weight model, and (2) framing cycle- aware clustering. Keyword-weight model is discussed as an optimal solution between simple keyword extraction by fre- quency and complex syntactic and semantic parsing, that often yields ambiguous results. The architecture of the sys- tem is depicted in the Figure 8. Figure 8: NewsCube architecture [35] The importance of words depends on the news pyramid structure: the most prominent words and salient informa- tion will appear in the head, then sub-head, lead, and only then in main text. Therefore, suggested news structure- based extraction count not only word frequencies but also weight them considering the location of a word. The results Figure 9: ASSIST software architecture [6] are normalized with respect to the length of the structural element where the word was found. For the frame derivation they used the concept of covering aspect over time in frame cycle. It comes from the property of news issue, when just appeared, all news media uses the same source of information covering the basic facts, then ex- tending it with additional sources. The additional sources highly depend on the supported bias. The framing cycle- aware clustering focuses on differentiation the head group of common articles from the tail group by calculation common- ness and uncommonness between articles and then splitting them in the 2D space. Measures of (un)commonness are based on cosine similarity. Obtained values depend on key- word weight and the corresponding commonness and un- commonness of keywords within an article, that describe how often keywords appear in a considered set of articles. With respect to the research question keyword-weight model represents an improved previously observed solutions on sim- ple most frequent word extraction in Sections 3.1.1. On the other hand, the approach simplifies the model to syntac- tic structure, avoiding extraction of semantic information rather utilizing it as cosine similarity. It allows readers to have more objective scope of news articles on an issue, but does not explain or construct frames that change perspective with a particular word choice. 3.1.6 Named entity recognition and sentiment anal- ysis [6] have suggested an ASSIST system approaching fram- ing analysis problem. Their solution consists of 3 modules: named entity extraction, term extraction, and sentiment analysis, thus developing a text mining platform to support framing analysis. The architecture of the ASSIST software is represented in Figure 9. Named entity recognition (NER) allows a user to identify the main roles in the text. The authors have pointed out that only roles and locations, typically used in NER, are not useful for their task, and therefore they expanded the entities’ types and recognized 26 categories. This resulted in the better recognition of the main topics or frames in the text and also semantic annotation of the words, but also highly depended on the algorithm of a particular NER
  • 7. implementation. Used in ASSIST BaLIE semi-supervised NER [30] faced the problem of defining categories for the same word, but belonging to different entities, e.g. name and town. Term extraction is similar to keyphrase extraction func- tions. It focuses on the salient linguistic elements, that form the specific topic or concept [19, 4]. These concepts can also represent word choice and labeling that is selected by the author of news article to describe an event or situation. TerMine combined the ideas of term formation patterns and statistical information of a term. The authors provided the examples, where labels such as ”Big Brother” and ”DNA database” were extracted from newspaper articles about ID cards. Sentiment analyzer was a HYSEAS software [36], and de- termined the used words’ tone in the text, and resulted as a summarized score per sentence. The module complied the task of determining positive, negative, or neutral effect of perception of an article, though, the module faced problems of accuracy especially with misclassified neutral sentences. The suggested system was developed to aim social scien- tists to perform semi-automated framing analysis. The out- puts of modules required further qualitative analysis, and did not yield a aggregated result analyzing module or visu- alization, rather provided three disjoint outcomes for further processing. 3.2 Deductive analysis Deductive analysis solves a task of determining specified frames in a set of given texts. Methods either treat a frame as a whole, or split it into framing devices and try to find a frame by its parts. For the following approaches news ar- ticles are already coded, mostly manually, and the task of computer-assist deductive analysis is to increase the speed and accuracy of frame search in the remaining scope of ar- ticles. 3.2.1 Classification Classification is a Machine Learning (ML) task of identify- ing a category of observations based on given set with known membership. In order to train a model, one needs a dataset with given labels which specify class membership. Then, roughly following the Knowledge Discovery in Databases (KDD) pipeline [3], model features are constructed, resulting a final dataset split into two parts for training and testing purposes. Logistic Regression The task of the research of [8] is to measure how much of each of 16 frames are present in smoking, immigration, and same-sex marriage issues. Binary logistic regression clas- sifiers were trained for each frame resulting all in all in 16 × 3 = 48 classifiers. Binary classifier decides if a frame is present or not. As a features for the model the researches suggested to use a binary code for each word in a framing code book: if a word was found in an article it had value 1, and 0 otherwise. The prediction result is a real value, and it measures the accuracy or probability of a frame in a considered article. Classification results for smoking issue are depicted in Figure 10. The area under the curve (AUC) is a standard measure of classification correctness, it ”expresses the probability that the classifier will rank a positive document above a negative Figure 10: Accuracy of frame search with logistic regression for smoking issue[8] Figure 11: Highest-impact words in classifi ers for smoking issue[8] document”[32]. The researches used it to prove that logistic regression in the approach performed better than a random classifier, and showed that some devices have more promi- nent words and are easier distinguished than the others. The most significant words of the best detected frames can be specified. In order to do this it is suggested to calculate a product of all learned coefficients for a feature (word) and feature’s average value in the testing set. This value favors frequent terms and lowers influence of infrequent ones. Fig- ure 11 depicts an example of tree most represented frames and their framing devices for smoking issue. The results confirms the importance well defined frame devices and the code book. The number in brackets in Fig- ure 10 shows the prevalence of each frame in the texts, which is used to depict frames in the current order, and there is no obvious dependency between frame prediction and its preva- lence. Therefore, the accuracy of classifier increases, if the word choice must correctly represent a frame. Ensemble of logistic regressions [32] performed deductive analysis as a classification task based on a ensemble of logistic regression functions. They represented each article as a bag-of-word model and calcu- lated TF-IDF scores for each term. Instead of a TF compo- nent they applied sub-linear frequency scaling of 1+log(TF) and used l2 normalization for IDF weights by adding 1. Each frame had 3 to 4 framing devices, and each article in train- ing and testing sets was manually labeled 1 or 0 if a fram- ing device was found in a text or not. Factor analysis on framing devices described, which ones are more influential, and, as can be seen in Figure 12, framing devices C2 and E2 were discarded due to low factor loading and coherence with frames. Manually annotated documents (5875) were used for train- ing and testing purposes based on ten-fold cross validation. To measure accuracy of the results the researches applied two metrics: agreement between coders or classifiers in a en-
  • 8. (a) Intercoder and classificator agreement (b) AUC for ROC automated classification prediction Figure 12: Results of frame identification with human cod- ing, single logistic classifier and ensemble of classifiers for each of identifier questions[32] semble and AUC. The results for both metrics are depicted in Figures 12a and 12b correspondingly. The classificator ensemble showed better results compared to a single classifier, and also comparison to a small manually annotated data set (randomly chosen coders annotated 159 articles multiple times, approximately, 3% of overall number, to calculate inter-coder agreement) yield somewhat better prediction in 7 of of 11 framing devices. Nonetheless, some framing devices are better classified than others. AUC in- dicated the ensemble is more accurate in prediction by 20% points in average. The approach revealed accurate prediction results for some framing devices, and poor for the others. The scholars de- fined it as a problem of ambiguous interpretation of formu- lated frames and complex message characteristics. Compre- hensive word choice and explained conceptions would help to improve the results. Similar measures applied [9], and discoursed performances of holistic approach when frame is coded in a general and indicator-based approach, which is frequently used for man- ual deductive coding and consisted of searching a frame by determining presence or absence of its attributes. Addition- ally to AUC and Krippendorff’s Alpha (KA) for inter-coder agreements, they stated that accuracy (AC) of coincidence between manual and computer-assist coding must be used. All three values are used together and evaluated the correct- ness of a frame found in a text. The results showed that after representing documents with TF-IDF in classification task the holistic approach performed better than the indicator-based one. Moreover, with some- what drop of accuracy supervised machine learning (SML) algorithms can be noticed in a set of documents if it orig- inated another resource than the training set. The exper- iment showed results ranged from 0.79 to 0.96 for generic frames such as conflict, economic consequences, morality, and human interest, on which the scholars concluded that SML is suitable for issue specific deductive analysis. Support Vector Machines [18] attempted to identify different word choice between four news outlets. For the analysis they removed stop words, ap- plied porter stemmer, and extracted unigrams, bigrams, and trigrams of words to represent texts as bag-of-word model based on TF-IDF. To find similar articles or mates, they used the Best Reciprocal Hit (BRH) algorithm that works similar to bioinformatics algorithms that identify similar genes. Cosine similarity yielded the resulting vector space model, and allowed to extract top n nearest-neighbors for each document. The scholars also applied labels of article source for the training data. A linear SVM classifier was trained and tested with ten- fold cross validation, and the performance was measured by the break-even-point (BEP) ”which is a hypothetical point where precision (ratio of positive documents among retrieved ones) and recall (ratio of retrieved positive documents among all positive documents) meet when varying the threshold”[18]. One linear classifier was used for one pair of news outlets. The researches stated, that BEP reflects a measure of sep- arability between news outlets’ lexical choice. Figure 13 depicts BEP values with corresponding 2D MDS represen- tation and example of most influential words with highest TF-IDF score for two news outlets. Figure 13: BEP metric used to compare word choice of news outlets[18] Though the result yielded separable clusters of most spe- cific words for each outlet, the approach is rather straight- forward and applied SVM, which generally performs well for text classification problem[24]. Moreover, the problem ad- dressed as comparison of words choice between news outlets, but not search of frames within them. Supervised Hierarchical Latent Dirichlet Allocation Usually topic modeling (LDA) is utilized as an approach for inductive framing analysis, but [31] suggested his Supervised Hierarchical Latent Dirichlet Allocation (SHLDA) method based on LDA that aims at revealing not only agenda-setting constituent, but also how the issue is framed and how people speak about it. The author describes frame as a second level of agenda-setting, containing the latent meaning of the text. LDA is also known as topic modeling technique and is used to assign words from texts to abstract SHLDA is a supervised method that reveals topic hierar- chies and represent a frame with the most probable words. Each document is assigned to response value (label or nu-
  • 9. Figure 14: Results of SHLDA algorithm: formed frames and their development and polarization over time by each polit- ical party [31] Figure 15: Defined polarity of words derived by SHLDA [31] meric value) that represents the author’s perspective, e.g. sentiment or ideology. A text is represented as a bag of sen- tences and a sentence is a bag of words. Each word then is in iterative way assigned to a frame it represents, and output of the model is ”a hierarchy of topics which is informed by a label”[31]. Figure 14 depicts the result of the analysis. The lower lev- els of topic hierarchy were more specific about word choice and have two political parties framed the issue according to their interest. The concept of the approach is similar to previously described research with SVM, when the analysis lies between deductive and inductive methods: on the one hand, a label or frame is pregiven, and we do not need to derive it, but on the other hand, framing devices of a frame are unknown, and the task is to find them. Apart from topic hierarchy, the model provides for each word a lexical regression parameter, where highest values show positive as- sociation of a word, and lowest values — negative. A lexical regression parameter allows to analyze results not only on a topic level, but specify valence of each word (Figure 15). 3.2.2 Hierarchical clustering [27] for their deductive analysis for biotechnology issue proposed a clustering method. They described biotechnol- ogy frame with framing devices suggested by Entman: prob- lem definition, casual attribution, moral evaluation, and treat- ment. Manual inductive analysis relied on existing code- books for this topic, and filled framing devices’ values with text elements obtained from set of texts. It resulted to a specific code-book for the particular research. The idea of the suggested deductive method is to man- ually code the presence/absence of each attribute element for each article, use intercoder agreement for model fea- tures, and then cluster the elements in Ward hierarchical way. Clustering of the small frame elements revealed hid- den connections between them, and enabled to derive differ- ent frames for various time periods, determining in parallel significance of each frame. Though, all coding for frame’s elements the authors per- formed manually, as well as the meaningful names for the obtained frames from clusters, the approach allowed to ob- tain 3 clusters or 3 frames and statistically supported by heterogeneity measures. 3.2.3 Homogeneity analysis [45] performed framing analysis on the topic of refugees in Belgium, where they wanted to find and measure pres- ence of derived victim and intruder frames in two newspa- pers. The author suggested a qualitative method of frame derivation that he formalized later in [46], and applied it in inductive analysis. The result of it is a code-book with spec- ified framing and reasoning devices and particular values for each frame. A part of the code-book is shown in Figure 16, where numbers in brackets represent framing devices for victim-frame and letters — for intruder-frame. Figure 16: A sample frame matrix with framing devices [45] The coders needed to indicate whether framing devices ap- peared or not. 2-3 coders worked on these articles, therefore, Cohen’s kappa was applied to calculate intercoder reliability of each device. Devices with values less than 0.6 were dis- carded. To determine which devices had greater influence, the scholars relied on homogeneity analysis by means of al- tering squares (HOMALS), which has a similar meaning as MDS and projects values onto lower dimensional space pre- serving degree of significance of each value. Framing devices plotted in two-dimensional space close to each other indi- cated similarity of articles. Figure 17 depicts the result of HOMALS analysis. Clusters formed by framing devices are clearly separable: intruder-frame devices occupied the left side of the plot, whereas victim-frame — the right side. The authors described the separation into upper and lower part in the Figure 17 as distinction between journalists’ per- spective: bottom part considered to have in-group attitude using pronouns such as we/our while talking about the issue, whereas the top part have out-group position, addressing the
  • 10. Figure 17: Homogeneity analysis of framing devices between intruder-frame (left) and victim-frame (right) [45] Figure 18: Average scores for intruder-frame and victim- frame representing the difference between coverage in Flemish-language (orange) and French-language (green) newspapers [45] problem with pronouns they/their. For further step the scholars analyzed the devices that we positioned in the center of the plot. Qualitatively they derived that these devices are used interchangeably within articles, and therefore, could not be used to clearly separated into one of the clusters. In order to measure ”how much” of each frame is detected in an article, they suggested to use Dimension 1 to scale influence of each particular framing device and sum the influence into resulting index. Conse- quently, each article had two indices: for intruder-frame and victim frame. Being grouped by news outlet, these values are depicted in Figure 18, where a green half-circle represents French-language newspapers and an orange one — Flemish ones. Results show that 50% of frames were shared between Flemish-language and French-language newspapers, and if French-language newspapers tend to refer to the victim- frame, Flemish-language outlets use usually both. Figure 19: Framing devices of Israel’s position in Hamas conflict [44] Important criteria of word choice is to cover exclusively the perspective of each particular frame. The paper desig- nated that framing devices being well-formulated allow to compare word choice in a quantitative way. For example, apart from newspaper comparison suggested index allows to analyze how each frame was referred to during some time period. 3.2.4 Semantic Network Analysis [44] conducted deductive analysis based on Semantic Net- work Analysis (SNA). They represented sentences as seman- tic statements, forming a network. The core object is a predicate representing an event. An event has a subject (actor of the action), object (the actor at whom the action was addressed), and the source (represents the origin of this event or statement). The sentences are parsed with SNA into semantic constituents, and having the overall semantic network of words and their roles, analysis of the text looks like search on the network. Figure 19 depicts an example of the network. Clearly, framing devices also consist of subject, predicate, and object, therefore, the search for the frame will be done as a rule-based search for the elements of framing devices. Based on this, the overall structure of the analysis comprises (1) syntactic analysis with parsing the text, (2) role extrac- tion for the words in a sentence, (3) determining frames by relation extraction in a rule-based way from the obtained network that specify framing devices. For the evaluation the authors coded 100 testing sentences manually, and to discuss the correspondence of manual and automated coding used inter-coding reliability values such as Cohen’s kappa and Krippendorff’s alpha in conjunction with F1 score. Because the values of inter-coding reliability are not strongly defined, the inference of this comparison has no solid basis, but, nevertheless, the authors believe that results are quite good based on the F1 score ranged between 0.71 and 0.83. 4. DISCUSSION Originally framing analysis was a part of content analysis. Researches from social sciences manually conducted it based on the qualitative methods for text analysis, but recently quantitative methods has become more popular instrument
  • 11. [26]. Nevertheless, some researches argue about the quality of the quantitative analyses and point on the weakness of current text analysis methods in terms of distinguishing the hidden meaning between lines. Methods for computer-assist framing analysis exist both for inductive and deductive analyses. Their summary and additional information we present in Appendices A and B. Some considered inductive approaches are still mostly based on agenda-setting properties of a frame, rather than on fram- ing itself. Framing attributes, being usually nouns, describe what to think about (topic), but not how to think about it (interpretation). Frames are defines as salient piece of information, and this significance comes from how frequently media or politicians refer to a specific concept. A simple interpretation of it lies behind most frequent words in the text corpus, which tend to be a common starting point for the inductive analysis (see Section 3.1.1). On the one hand, this approach follows idea of word salience, but on the other hand, it misses the semantic connection between neighboring words. Several approaches compared news articles based on word choice and labeling of each news outlet. They extracted sig- nificant lexical elements based on syntactical properties of texts (Section 3.1.3), semantic (Section 3.1.4), or both (Sec- tions 3.1.5 and 3.1.6). All extracted information designated the difference of the word choice, but demanded extra quali- tative interpretation of the words’ polarity and interrelation. Moreover, the analyses did not intend to find how a percep- tion of a same entity or concept changes depending on the word choice, rather searching for non-intersecting group of words. Frame construction regarding to word choice and labeling should have the following requirements: 1. identify the main elements of a text, which could be answers to the 5W1H questions (see Section 2) or other interesting terms of a text; 2. distinguish and separate different frames based on the word choice of each category: 3. find accompanying words that form frame perception, i.e. sentiments. Requirement 1 can be fulfilled with keyphrase extraction or Named Entity Recognition techniques, while for require- ment 2 a specific similarity measure should be applied to identify semantically related words, and requirement 3 can be addressed by sentiment analysis. Deductive analysis approaches rely on classification meth- ods (Section 3.2.1), clustering (Section 3.2.2), statistical (Sec- tion 3.2.3), or rule-based (Section 3.2.4). All methods de- pend on various properties of already derived frames and how they represent texts. Texts usually are represented as a table with attributes. Attributes (or features) are coded in a binary way or with TF-IDF. [37] suggested several features for text processing task, especially classification, that could be also tried in framing deductive analysis. A coded set of articles remains to be the most utilized approach in the deductive analysis. Especially prior coded articles are required for a classification task. Typically, ar- ticles are coded manually, and intercoder agreement is used for classes results to good performance. A significant prop- erty of frames is high quality of word choice that exclusively represents framing devices. But a small amount of manu- ally coded articles remains a problem for classification, as, roughly saying, the bigger training data set is, the better classification results are. To enable fully automated coding the following steps are required: (1) formulating code-book’s attributes as keywords and then search on them in the text, (2) extract keywords from the text and calculate similarity measures with code-book’s attributes. Solution 1 could be address with concept search (C-Search) [22], where the main idea is to use semantic search and if failed then switch to syntactic search. It is Information re- trieval (IR) technique, and evaluation showed the results better than simple keyword search. For the solution 2 key- word extraction techniques can be applied, i.e. [7]. Well- coded frames regarding word choice of each frame attribute are an obligatory requirement for any applied afterwards de- ductive approach. 5. CONCLUSION In this work we provided a comprehensive literature re- view on the existing computer-assist framing analysis ap- proaches, which include inductive and deductive analyses. Considered approaches are semi-automated and usually in- clude human interaction in a analysis pipeline. Moreover, only few methods are focused on word choice and labeling, and, to the best of our knowledge, none of them address the problem of identification of word choice of a single concept. Current work also suggested several ideas that could solve the problem of framing by word choice and labeling for both inductive and deductive analysis. 6. REFERENCES [1] Honest reporting: Lack of context bias. http://honestreporting.com/news-literacy-defining- bias-lack-of-context/. [2] Media bias in strategic word choice. http://www.aim.org/on-target-blog/media-bias-in- strategic-word-choice/. [3] Sigkdd: The community for data mining, data science and analytics. http://www.kdd.org/. [4] Termine software. http://www.nactem.ac.uk/software/termine/. [5] D. S. J. Allen. Media bias: 8 types [a classic, kinda], 2015. [6] S. Ananiadou, D. Weissenbacher, B. Rea, E. Pieri, F. Vis, Y. Lin, R. Procter, and P. Halfpenny. Supporting frame analysis using text mining. In 5 th International Conference on e-Social Science, 2009. [7] A. Bougouin, F. Boudin, and B. Daille. Topicrank: Graph-based topic ranking for keyphrase extraction. In International Joint Conference on Natural Language Processing (IJCNLP), pages 543–551, 2013. [8] A. E. Boydstun, D. Card, J. Gross, P. Resnick, and N. A. Smith. Tracking the development of media frames within and across policy issues. 2014. [9] B. Burscher, D. Odijk, R. Vliegenthart, M. De Rijke, and C. H. De Vreese. Teaching the computer to code frames in news: Comparing two supervised machine learning approaches to frame analysis. Communication Methods and Measures, 8(3):190–206, 2014. [10] D. Chong and J. N. Druckman. Framing theory. Annu. Rev. Polit. Sci., 10:103–126, 2007.
  • 12. [11] S. R. Corman, T. Kuhn, R. D. McPhee, and K. J. Dooley. Studying complex discursive systems. Human communication research, 28(2):157–206, 2002. [12] C. E. Crawley. Localized debates of agricultural biotechnology in community newspapers: A quantitative content analysis of media frames and sources. Science Communication, 28(3):314–346, 2007. [13] P. D’Angelo and J. A. Kuypers. Doing news framing analysis: Empirical and theoretical perspectives. Routledge, 2010. [14] S. T. Dumais. Latent semantic analysis. Annual review of information science and technology, 38(1):188–230, 2004. [15] T. Dunning. Accurate methods for the statistics of surprise and coincidence. Computational linguistics, 19(1):61–74, 1993. [16] R. M. Entman. Framing: Toward clarification of a fractured paradigm. Journal of communication, 43(4):51–58, 1993. [17] R. M. Entman. Projections of power: Framing news, public opinion, and US foreign policy. University of Chicago Press, 2004. [18] B. Fortuna, C. Galleguillos, and N. Cristianini. Detection of bias in media outlets with statistical learning methods. Text Mining, page 27, 2009. [19] K. Franzi, S. Ananiadou, and H. Mima. Automatic recognition of multi-word terms. International Journal of Digital Libraries, 3(2):117–132, 2000. [20] e. G. Axtell, A.Bartley. Radford University Core Handbook. Radford University. [21] D. M. Garyantes and P. J. Murphy. Success or chaos? framing and ideology in news coverage of the iraqi national elections. International Communication Gazette, 72(2):151–170, 2010. [22] F. Giunchiglia, U. Kharkevich, and I. Zaihrayeu. Concept search: Semantics enabled syntactic search. 2008. [23] I. Hellsten, J. Dawson, and L. Leydesdorff. Implicit media frames: Automated analysis of public debate on artificial sweeteners. Public Understanding of Science, 19(5):590–608, 2010. [24] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning, pages 137–142. Springer, 1998. [25] D. Kahneman and A. Tversky. Choices, values, and frames. American psychologist, 39(4):341, 1984. [26] J. Matthes. What’s in a frame? a content analysis of media framing studies in the world’s leading communication journals, 1990-2005. Journalism & Mass Communication Quarterly, 86(2):349–367, 2009. [27] J. Matthes and M. Kohring. The content analysis of media frames: Toward improving reliability and validity. Journal of communication, 58(2):258–279, 2008. [28] M. E. McCombs and D. L. Shaw. The agenda-setting function of mass media. Public opinion quarterly, 36(2):176–187, 1972. [29] M. M. Miller. Frame mapping and analysis of news coverage of contentious issues. Social Science Computer Review, 15(4):367–378, 1997. [30] D. Nadeau, P. Turney, and S. Matwin. Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. 2006. [31] V.-A. Nguyen, J. L. Boyd-Graber, and P. Resnik. Lexical and hierarchical topic regression. In Advances in Neural Information Processing Systems, pages 1106–1114, 2013. [32] D. Odijk, B. Burscher, R. Vliegenthart, and M. De Rijke. Automatic thematic content analysis: Finding frames in news. In International Conference on Social Informatics, pages 333–345. Springer, 2013. [33] Z. Pan and G. M. Kosicki. Framing analysis: An approach to news discourse. Political communication, 10(1):55–75, 1993. [34] Z. Papacharissi and M. de Fatima Oliveira. News frames terrorism: A comparative analysis of frames employed in terrorism coverage in us and uk newspapers. The International Journal of Press/Politics, 13(1):52–74, 2008. [35] S. Park, S. Kang, S. Chung, and J. Song. Newscube: delivering multiple aspects of news to mitigate media bias. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 443–452. ACM, 2009. [36] S. Piao, Y. Tsuruoka, and S. Ananiadou. Hyseas: A hybrid sentiment analysis system. In Proceedings of the Fourth International Conference on Interdisciplinary Social Sciences, 2009. [37] P. Przybyla, N. T. Nguyen, M. Shardlow, G. Kontonatsios, and S. Ananiadou. Nactem at semeval-2016 task 1: Inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features. Proceedings of SemEval, pages 614–620, 2016. [38] M. Recasens, C. Danescu-Niculescu-Mizil, and D. Jurafsky. Linguistic models for analyzing and detecting biased language. In ACL (1), pages 1650–1659, 2013. [39] M. Scott. Wordsmith tools 6. Oxford: Oxford University Press, 2011. [40] M. G. Sendén, S. Sikström, and T. Lindholm. ”she” and ”he” in news media messages: pronoun use reflects gender biases in semantic contexts. Sex Roles, 72(1-2):40–49, 2015. [41] Y. Tian and C. M. Stewart. Framing the sars crisis: A computer-assisted text analysis of cnn and bbc online news reports of sars. Asian Journal of Communication, 15(3):289–301, 2005. [42] M. Touri and N. Koteyko. Using corpus linguistic software in the extraction of news frames: towards a dynamic process of frame analysis in journalistic texts. International Journal of Social Research Methodology, 18(6):601–616, 2015. [43] G. Tuchman. Making news: A study in the construction of reality. 1978. [44] W. van Atteveldt, T. Sheafer, and S. Shenhav. Automatically extracting frames from media content using syntacting analysis. In Proceedings of the 5th Annual ACM Web Science Conference, pages 423–430. ACM, 2013. [45] B. Van Gorp. Where is the frame? victims and intruders in the belgian press coverage of the asylum
  • 13. issue. European Journal of Communication, 20(4):484–507, 2005. [46] B. Van Gorp. Strategies to take subjectivity out of framing analysis. Doing news framing analysis: Empirical and theoretical perspectives, pages 84–109, 2010. [47] J. Woelfel. Artificial neurol networks in policy research: A current assessment. Journal of Communication, 43(1):63–80, 1993. [48] J. Woelfel and N. Stoyanoff. Catpac: A neural network for qualitative analysis of text. In annual meeting of the Australian Marketing Association, Melbourne, Australia, 1993.
  • 14. Method name, source Information about the data Preprocessing Methods Output Résumé PCA on cosine similarity matrix [28] • Associated Press dispatches • 12.07.1984 – 27.06.1995 • 1465 articles • Keyword: wetlands • Stop words and ambiguous words manually removed • Number of most frequent words chosen by authors • 1 document = 1 list • Cosine similarity matrix of most frequent terms co-occurrence • PCA • Hierarchical clustering on 3 eigenvector values • Table with frames’ names and corresponding framing devices • Visualization of frames in 3D space • Semi-automated • Agenda setting PCA on covariance matrix [11] • Lexis-Nexis database • 1.01.1992 – 1.12.2004 • 1156 articles • Keywords: GMO, agricultural biotech*, etc. • Stop words and ambiguous words manually removed • Number of most frequent words chosen by authors • 1 document = 1 list • PCA of most frequent words with varimax rotation, • 8 most meaningful eigenvalues selected • Terms with loading ≥ 0.3 form a frame • Table with grouped framing devices • Qualitative analysis to interpret results and name frames required • Semi-automated • Agenda setting Self-organizing map [17] • CNN and BBC websites • 1.03.2003 – 1.09.2003 • 730 articles • Keywords: SARS • Stop words and verbs removed • 1 document = 1 list • Top 40 of most frequent words are manually ranked • Self-organizing map -unsupervised neural network – of most frequent words • Hierarchical clustering based on Ward’s method • Table with grouped framing devices • Qualitative analysis to interpret results and name frames required • Semi-automated • Agenda setting Semantic network [22] • New York Times website • 1980 – 2006 • 54 articles • Keywords: artificial sweetener, etc. • Data normalization • Stop words removed • All document for 1 topic = 1 list • Number of most frequent words chosen by authors • Cosine similarity matrix of most frequent terms co-occurrence • Elements ≥ threshold form a network • Normalize similarity • Graph visualization • Frames are obtained by visual interpretation • Semi-automated • Agenda setting Keyword extraction [41] • Lexis-Nexis database • 01.2010 – 08.2010 • 40 articles • Keywords: Greece, economy, debt, crisis, etc. • Bag-of-words of an article • Bag-of-words of all articles • Log-likelihood ratio calculation • List of concordances of keywords • Full qualitative analysis required • Semi-automated • Agenda setting • Word Choice + labeling Centering Resonance Analysis [33] • Lexis-Nexis database • 06.2006 – 06.2007 • 218 articles • Keywords: terrorist attacks, Iraq, Israel, Afghanistan • Only noun-phrases left, other words form connections • Pronouns are dropped • Stemming • Central resonance analysis – frames are based on words of the biggest influence or centrality • Network of shared/distinct keywords between frames • Qualitative analysis for result interpretation required • Semi-automated • Agenda setting • Word Choice Latent Semantic Analysis [39] • Reuters website • 1996 – 1997 • Not specified • All articles • Stop-words removed • Latent Semantic Analysis • Ordered list of terms and their similarity value compared to a given word • Qualitative interpretation required • Automated • Framing ± • Word Choice + labeling Keyword-weight model [34] • “20 most publishing news providers” • 2007 • 406 articles • “many important events” • Bag-of-words • Stop-words removed • Structure of text employed • News Structure-based Extraction • Keyword-weight model • Aspect-based Clustering • A system that shows spectrum of articles with minimized framing bias • Automated • Framing • Word choice Named entity recognition [35] No information given No information given • Named Entity Extraction • Term Extraction • Sentiment Analysis • Output from 3 independent systems • Qualitative interpretation required • Semi-automated • Framing • Word Choice + labeling APPENDIX A. SUMMARY TABLE OF COMPUTER-ASSIST INDUCTIVE FRAMING ANALYSIS METHODS
  • 15. Method name, source Information about the data Preprocessing Methods Output Résumé Logistic Regression [7] • Lexis-Nexis database • 1990 – 2012 • 9502 articles • Keywords: smoking, immigration, same-sex marriage • Manual frame coding on frame level • Features: presence or absence of each word compared to code-vocabulary • Classes: presence/absence of a frame • 1 classifier per frame • Logistic Regression • Obtained prediction is used as a measure of frame presence • Visualization • Trained model • Semi-automated • Agenda-setting • Word choice and labeling Ensemble of logistic regressions [31] • Dutch Lexis-Nexis database • 1995 – 2011 • 5875 articles • Not mentioned • Manual frame coding on attribute level • Features: Bag-of-words + TF-IDF score • Classes: presence/absence of an attribute • 1 classifier per attribute • Factor analysis • Ensemble of logistic regressions • Obtained prediction is used as a measure of frame presence • Table of comparison between predictions • Trained model • Semi-automated • Framing • Word choice + possibly labeling Support Vector Machines [17] • websites of AJ, CNN, DN, and IHT • 31.03.2005 – 14.04.2006 • 21552 articles, 675 left after preprocessing • Not mentioned • Porter stemmer • Features: Bag-of-words + TF-IDF score • Classes: newspapers’ names • Linear Support Vector Machines • Table with BEP measure between pair of news outlets • 2D visualization of MDS representation • List of most different word choice • Trained model • Automated • Framing • Word choice + labeling Supervised Hierarchical Latent Dirichlet Allocation [30] • GovTrack • 109th US Congress • 5201 + 3060 turns • Not mentioned • Classes: derived label classes from prior knowledge • Supervised Hierarchical Latent Dirichlet Allocation • Visualized hierarchy of topics based around given labels • Automated • Framing • Word choice + labeling Hierarchical clustering [25] • Lexis-Nexis database • 1992 – 2001 • 1000 articles • Keywords: biotech, genetic, genome, DNA • Manual frame coding on attribute level • Features: intercoder agreement • Ward hierarchical clustering • Table with attributes and values showing membership of a cluster/frame • Qualitative analysis to interpret results and name frames required • Semi-automated • Framing • Word choice + labeling Homogeneity analysis [44] • Flemish- and French-language newspapers • 20.10.2000 – 29.04.2001 and 1.09.2002 – 31.08.2003 • 1489 articles • Keywords: refugees/asylum- seekers • Manual frame coding on attribute level • Intercoder agreement calculation – attributes < 0.6 are discarded • Features: intercoder agreement • Homogeneity analysis of framing attributes • Index of frame presence calculation • Visualization of homogeneity analysis • Visualization of comparison between newspapers • Visualization of frames’ coverage dynamics • Qualitative analysis to interpret some results required • Semi-automated • Framing • Word choice + labeling Semantic Network Analysis [43] • Lexis-Nexis database • 20414 articles • 27.12.2008 – 20.01.2009 • Keywords: gaza • Parsed sentences of attributes and texts • Rule-based Semantic Network Analysis • Table with measures for each attribute being found • Trained model • Semi-automated • Framing • Word choice + labeling B. SUMMARY TABLE OF COMPUTER-ASSIST DEDUCTIVE FRAMING ANALYSIS METHODS