A Phrase-Frame List For Social Science Research Article Introductions
1. Accepted Manuscript
A phrase-frame list for social science research article introductions
Xiaofei Lu, Jungwan Yoon, Olesya Kisselev
PII: S1475-1585(18)30115-2
DOI: 10.1016/j.jeap.2018.09.004
Reference: JEAP 692
To appear in: Journal of English for Academic Purposes
Received Date: 10 March 2018
Accepted Date: 14 September 2018
Please cite this article as: Xiaofei Lu, Jungwan Yoon, Olesya Kisselev, A phrase-frame list for
social science research article introductions, (2018), doi:
Journal of English for Academic Purposes
10.1016/j.jeap.2018.09.004
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form.
Please note that during the production process errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
2. ACCEPTED MANUSCRIPT
A phrase-frame list for social science research article introductions
Xiaofei Lu a,*, Jungwan Yoon a, Olesya Kisselev a
a Department of Applied Linguistics, The Pennsylvania State University, 234 Sparks
Building, University Park, PA 16802, USA
*Corresponding author. Tel.: +1 (814) 865-4692
E-mail addresses: xxl13@psu.edu (X. Lu), jxy204@psu.edu (J. Yoon), ovk103@psu.edu
(O. Kisselev).
3. ACCEPTED MANUSCRIPT
A phrase-frame list for social science research article introductions
Abstract
This study aimed to contribute to recent corpus-based efforts in compiling lists of academic
expressions by deriving a pedagogically useful list of phrase-frames for a specific part-genre, i.e.
research article introductions, in six social science disciplines. A combination of corpus statistics
was used to extract an initial set of phrase-frame candidates with adequate frequency, variant
diversity, and range across disciplines. These candidates were then manually filtered in several
steps to ensure their semantic completeness and pedagogical value. The resulting 370 five-word
phrase-frames and 84 six-word phrase-frames were analyzed structurally and functionally.
Evaluation of a random sample of 100 phrase-frames by a panel of academic writing instructors
and student writers indicated that the overwhelming majority of the phrase-frames were considered
pedagogically useful by either the instructors or the student writers, or both. The implications of
the current study for academic formulaic language research and of the phrase-frame list compiled
for academic writing pedagogy are considered.
Keywords: Academic writing; Formulaic language; Phrase-frames; Research article introductions
1. Introduction
In the past decades, many corpus-based studies have argued for the necessity of lists of
academic formulaic expressions for students and teachers of English for Academic Purposes
(EAP), explored the methodological issues involved in compiling such lists, and presented several
pedagogically useful lists of different types of academic expressions (e.g. Ackerman & Chen, 2013;
Biber et al., 2004; Martinez & Schmitt, 2012; Morley, 2015; Nattinger & DeCarrico, 1992;
4. ACCEPTED MANUSCRIPT
2
Simpson-Vlach & Ellis, 2010). The EAP community has shown substantial interest in such lists
and recognized the importance of their continued improvement and enrichment. One type of
formulaic expression that is now increasingly seen as pedagogically relevant but that has not yet
been systematically tackled in previous corpus-based endeavors is phrase-frames (hereafter p-
frames), i.e. semi-fixed sequences that contain a variable slot that can be filled by different words
e.g. the * of the study, where the open slot may be filled by aim, goal, and purpose, among others.
In this study, we extend recent efforts in compiling pedagogically useful lists of academic
formulaic expressions by deriving a list of p-frames frequently used in a corpus of a specific part-
genre, i.e. research article (RA) introductions, in six social science disciplines. In doing so, we
hope to contribute to the methodological discussion of the extraction and selection of candidate p-
frames as well as the usefulness and feasibility of functional categorization of the p-frames for
pedagogical applications.
1.1. Language as phraseology
A growing body of research accumulated in the past two decades in corpus linguistics has
significantly contributed to the contemporary understanding of âlanguage as phraseologyâ
(Hunston, 2002, p.137). Language as phraseology is an assumption that positions the phrase, not
individual words, as a fundamental unit of meaning. This assumption has a long history in the field
of language pedagogy: language teachers were, possibly, the first to grapple with practical issues
of formulaicity in language (Herbst, 2011; Sinclair & Carter, 2004; Stubbs, 2009). Indeed, any
attempt by the learner to achieve a degree of native-like language ability inevitably results in
realization that there exist multiple conventionalized ways of stringing words together in a
particular language that are unpredictable based on the traditionally understood ârules of grammarâ.
5. ACCEPTED MANUSCRIPT
3
The need to identify and, subsequently, teach conventionalized sequences was recognized as early
as the 1930s. In fact, the coinage of the term collocation belongs to the English language teacher
and researcher Harold E. Palmer. Palmerâs definition of a collocation as âa succession of two or
more words that must be learnt as an integral whole and not pieced together from its component
partsâ (Palmer, 1933 as cited in Stubbs, 2009, p.17) was effectively adopted by John Sinclair, who
took the term and the idea behind it to build a whole new field around it (Herbst, 2011; Johansson,
2011; Stubbs, 2009).
The original notion of language as phraseology can be summarized along the lines of the Idiom
Principle, which posits that âa language user has available to him or her a large number of semi-
preconstructed phrases that constitute single choices, even though they might appear to be
analyzable into segmentsâ (Sinclair, 1991, p.110). Corpus linguistics has provided evidence that
formulaicity is omnipresent in language, and that formulaic sequences are fundamental to the way
language is stored, processed, acquired and used (Hunston, 2002; Wray, 2008). Suggesting that
every conventionalized recurrent communicative function has a conventionalized linguistic form,
corpus linguistics has also effectively created a repository of research that showed which and what
types of multi-word sequences, or formulae, tend to appear in which or what types of language
modalities, registers, and genres1 (e.g. Biber et al., 2004; Hyland, 2008; Simpson-Vlach & Ellis,
2010). Biber et al. (2004) reported systematic differences in the distribution of lexical bundles of
different structures and functions across the registers of conversation, classroom teaching,
textbooks, and academic prose. Along similar lines, Simpson-Vlach and Ellis (2010) found that
while some formulas occur frequently in both academic speaking and writing, many formulas
1 Registers are textual varieties âassociated with a particular situation of use (including particular communicative
purposes)â (Biber & Conrad, 2009, p.6); examples include conversation, classroom teaching, and textbooks. Genres
are âabstract, socially recognised ways of using languageâ (Hyland, 2007, p. 149); examples include research articles,
book reviews, and conference abstracts.
6. ACCEPTED MANUSCRIPT
4
occur primarily in one or the other. Based on these findings on mode and register variation in
formulaic language use, many recent studies set out to analyze formulaic sequences in specific
academic genres or part-genres, some with attention to inter-disciplinary variation. Hyland (2008),
for example, analyzed a corpus of academic writing that consisted of research articles, dissertations,
and theses, and systematically delineated variation in lexical bundle use among four disciplines.
The understanding of language as phraseology and the insights from research into formulaic
language use in different modalities, registers, and genres have impacted thinking on the teaching
of academic language. In particular, it has been argued that incorporating analyses of formulaic
sequences and their functions in specific modalities, registers, and genres of texts may improve
the EAP writing curricula and learning outcomes (Coxhead & Byrd, 2007; Paltridge, 2004).
1.2. Pedagogically oriented lists of academic formulaic expressions
Recognizing the value of lists of academic formulaic expressions for facilitating EAP learnersâ
analyses and acquisition of formulaic sequences, corpus linguistics has made systematic efforts in
deriving various types of such lists from corpora of academic language, such as the Academic
Formulas List (AFL; Simpson-Vlach & Ellis, 2010) and the Phrasal Expressions List (PHRASE
List; Martinez & Schmitt, 2012). Comparable to the aims of lists of academic vocabulary (e.g.
Coxhead, 2000; Gardner & Davies, 2013), the AFL and the PHRASE List both aimed to represent
pedagogically relevant contiguous formulaic sequences (e.g. in the present study) to be
incorporated into curricula and teaching materials. Both lists were based on large corpora of
academic speech and writing, which included different types of texts such as research articles and
textbooks in the written portion, and lectures and seminars in the spoken portion. Methodologically,
7. ACCEPTED MANUSCRIPT
5
both teams approached the selection of possible formulas using a combination of quantitative
corpus measures and qualitative analytical procedures, albeit with differential emphasis on them.
Simpson-Vlach and Ellis (2010) set out to generate a âformula teaching worthâ (FTW) score
for each formula, i.e. a composite score comprised of frequency and mutual information (MI) that
could predict human judgement of the teaching worth of the formula. To that end, the authors
recruited judges with teaching and testing experience to rate a random sample of 108 formulas,
using such criteria as âwhether or not they thought the phrase constituted âa formulaic expression,
or fixed phrase, or chunkâ â and âwhether or not they thought the phrase was âworth teaching, as
a bona fide phrase or expressionâ â (p.496). They subsequently ran multiple regression analysis on
the rated sample to derive beta coefficients of frequency and MI as predictors of human rater scores
and then used those coefficients to generate FTW scores for all formulas. They claimed that the
final AFL list contained, in theory, only pedagogically useful formulaic expressions.
Martinez and Schmitt (2012) cast doubt on the validity of the measure, whereby only a small
subset of data was analyzed by human raters. They questioned such âstrict adherence to statistically
derived phrase selectionâ (p.306) and suggested that subjective judgment by raters with solid
teaching and testing backgrounds should be included as a determinant for item inclusion. They
incorporated several core criteria for the manual selection of candidates for the PHRASE List,
based largely on Wrayâs (2008) criteria for formulaicity. Examples of their criteria included âIs
the expression a Morpheme Equivalent Unit (MEU)?â, âIs the expression semantically
transparent?â, etc. These criteria were applied to all expressions extracted from the British National
Corpus. Martinez and Schmitt conceded that while such a methodology was âextremely time and
labor intensive,â the resulting list was âclearly enhanced pedagogicallyâ (p.310).
8. ACCEPTED MANUSCRIPT
6
Cortes (2013) and Morley (2015) took the qualitative analytical dimension a step further by
aligning academic expressions with rhetorical functions. Both studies also focused on more
specific EAP genres or part-genres. Cortes (2013) extracted a list of lexical bundles from a corpus
of RA introductions and matched them to rhetorical moves and steps (e.g. one of the major was
matched to the step âclaiming relevance of fieldâ). Morley (2015) extracted a list of phrases from
a corpus of postgraduate dissertations and organized them by communicative function, e.g.
âintroducing problems and limitationsâ. He operationalized phrases broadly as expressions of
variable length that were deemed useful for a communicative function, including both bundle-style
items such as âThe paper fails to specify âŠâ and rather long items such as âDifficulties arise,
however, when an attempt is made to implement the policyâ.
Taken together, existing corpus-based efforts in compiling lists of academic expressions all
highlight the importance to consider both quantitative corpus statistics and qualitative analyses in
identifying pedagogically relevant expressions. Meanwhile, these efforts have focused primarily
on continuous formulaic expressions. In the current study, we expand this line of research by
arguing that lists of academic formulaic expressions can be enriched with the inclusion of academic
p-frames, a position which we explore in detail in the section below. While we leave the alignment
of p-frames with rhetorical functions to a future study2, we note that studies such as Cortesâ (2013)
and Morleyâs (2015) hold much promise in providing EAP students with not simply a list of
academic expressions but a repository of linguistic units coupled with their specific rhetorical
functions.
2 The compilation of the p-frame list is a necessary first step toward matching p-frames to rhetorical functions, and 1)
the motivation and methodology for deriving the p-frame list and the structural and functional analysis of the p-frames
extracted and 2) the systematic alignment of p-frames with rhetorical functions both warrant in-depth discussion. We
thus defer the reporting of results regarding frame-function alignment to a follow-up study.
9. ACCEPTED MANUSCRIPT
7
1.3. Phrase-frames
In our view, lists of academic formulaic expressions will be greatly enhanced by the inclusion
of phrase-frames (Fletcher, 2006, 2011), i.e. multi-word sequences in which words form a âframeâ
around a variable slot. The variants of a frame often form one or more semantically close or
functionally similar clusters. For example, the variable slot in the * of the study may be filled in
by purpose and goal as well as motivation and rationale. Many studies have underscored the
importance of p-frames in the academic discourse. Biber (2009), for instance, found that academic
writing relies heavily on discontinuous frames, while conversation relies more on continuous
lexical sequences. Gray and Biber (2013) further showed that p-frames are more variable in
academic writing than in conversation. These observations give us reason to believe that novice
EAP writers stand to benefit from pedagogical resources that provide not only continuous set
expressions such as on the one hand but also discontinuous frames and their variants. Teaching
variable frames may allow teachers to introduce more language while lessening the cognitive
demand on memory; at the same time, examining the variants that fill the variable slot may be a
valuable exercise in understanding the degree of formulaicity of various constructions.
To be fair, some of the lists reviewed above included some p-frames. For example, Simpson-
Vlach and Ellis (2010) listed a few p-frames on the AFL (e.g. [a/large/the] number of), which they
created by compounding multiple overlapping n-grams. Martinez and Schmitt (2012) also dealt
with expressions âwith a variable componentâ on the ad hoc basis, but they went a step further:
when such an expression was noticed, they conducted âa careful follow-up searchâ in the corpus
to âidentify all variable forms of that expressionâ (p.312). However, the limited number of p-
frames included in these lists were identified based on the procedure that extracts p-frames from
overlapping high-frequency n-grams (or lexical bundles). Such a procedure has been
10. ACCEPTED MANUSCRIPT
8
problematized by Gray and Biber (2013), who reported that numerous discontinuous sequences
were not associated with lexical bundles. The extraction of p-frames thus necessitates a separate
procedure from that for extracting lexical bundles.
In a recent study, Cunningham (2017) explored the use of p-frames in 128 mathematics RAs.
She identified 180 p-frames specific to the mathematics discipline using a combination of
frequency, range, and keyness criteria. These p-frames were then analyzed structurally and
functionally. In her methodological procedure, the minimum frequency of each p-frame variant
was set to three. This procedure suffered the same limitation noted by Gray and Biber (2013),
leaving less frequent p-frame variants unanalyzed and many meaningful p-frames with diverse but
less frequent variants unidentified. The studyâs singular focus on the mathematics discipline also
calls for research on other disciplines.
The argument for general, non-disciplinary academic word and phrase lists is largely well
taken, since, after all, EAP courses do usually cater to students from diverse majors and
specializations. Nevertheless, linguistic variation across registers, genres, and disciplines is well
documented (Biber et. al, 1999; Hyland, 2008), and the need to âdetermine how meaning creation
worksâ in subsets of language (i.e. specific registers, genres, and disciplines) âthat show a
specialized grammar and vocabularyâ (Römer, 2010, p.308) has been called for. This need
prompted us to investigate p-frames in a particular EAP part-genre, i.e. RA introductions, within
a set of social science disciplines. Several studies reviewed above (e.g. Cortes, 2013; Cunningham,
2017; Morley, 2015) have generated valuable insights into formulaic language use in specific
academic genres and disciplines. In line with such insights and keeping in mind the pedagogical
imperative, we also hold the view that presenting lists of formulaic expressions for specific genres
11. ACCEPTED MANUSCRIPT
9
aids in fulfilling the promise of genre pedagogy, which sees âreal benefits for learners as they pull
together language, content, and contextsâ (Hyland, 2007, p.150).
1.4. Overview of the current study
This study aims to add to recent corpus-based efforts in compiling lists of academic
expressions by deriving a pedagogically useful list of p-frames for a specific part-genre, i.e. RA
introductions, in six social science disciplines. To this end, we first identified a set of p-frames
from a corpus of social science RA introductions and then subjected them to several manual
filtering procedures. All p-frames included in the final list were analyzed structurally and
functionally, and a subset was rated for pedagogical value by a panel of EAP writing instructors
and student writers. In what follows, we detail our methodological procedure, present the results
of the different steps of our procedure, and discuss the implications of our results for academic
formulaic language research and academic writing pedagogy.
2. Methodology
2.1. Corpus
The corpus used in the current study comprises 517,703 words of published RA introduction
sections sourced from the Corpus of Social Science Research Articles (COSSRA), developed by
our research team. COSSRA includes 600 RAs published in 2012-2016 in six social science
disciplines (Anthropology, Applied Linguistics, Economics, Political Science, Psychology, and
Sociology), with 100 RAs sampled from five journals in each discipline. The journals were
selected based on their impact factors, with their representativeness confirmed by two experts in
each discipline. For each journal, we first sampled 20 issues in the period 2012-2016 and
12. ACCEPTED MANUSCRIPT
10
subsequently sampled one RA per issue. The introduction sections of the RAs were extracted and
each saved as a plain text file. All files were manually checked for errors resulting from the
conversion process, and information unnecessary for p-frame identification were eliminated,
including parenthetical citations, footnotes, and mathematical formulas.
While many RAs contain an introduction section with the heading âIntroductionâ, some start
with an untitled section followed by a titled section. We included such untitled first sections in our
data as their primary rhetorical function was similar to the titled introduction sections. Table 1
summarizes the number of tokens of the RA introductions in each discipline. The subcorpus of
each discipline contained approximately comparable number of tokens, with the exception of
Economics, which had lengthier introductions than other disciplines. While it would be ideal to
compile a similar-sized subcorpus for each discipline, we opted to maintain an equal number of
texts for each discipline.
Table 1
Composition of the corpus of social science research article introductions.
Discipline Texts Tokens Proportion
Anthropology 100 76,263 14.7%
Applied Linguistics 100 63,357 12.2%
Economics 100 145,609 28.1%
Political Science 100 78,291 15.1%
Psychology 100 78,209 15.1%
Sociology 100 75,974 14.6%
Total 600 517,703 100%
2.2. Procedure
13. ACCEPTED MANUSCRIPT
11
The procedure for developing, analyzing, and assessing the p-frame list involved four stages:
initial candidate extraction, manual filtering, structural and functional analysis, and rater
assessment.
2.2.1. Automatic extraction of p-frame candidates
Several methodological issues needed to be considered in the initial candidate extraction stage.
The first had to do with the approach to p-frame identification. Previous studies employed either
the bundles-to-frame approach (e.g. Biber, 2009; Cunningham, 2017; Römer, 2010) or the fully
inductive approach (e.g. Fuster-MĂĄrquez & Pennock-Speck, 2015; Grabowski, 2015; Gray &
Biber, 2013). The former begins with identifying lexical bundles and then analyzes them to
determine p-frames. As mentioned above, this approach would fail to capture the full range of
variants of the p-frames identified and, as Gray and Biber (2013) noted, miss meaningful p-frames
with highly diverse, infrequent variants. To avoid these limitations, the current study adopted the
fully inductive approach, which identifies p-frames based on all continuous lexical sequences.
The second issue had to do with p-frame length. We began by extracting 4-word frames
following the practice in several previous studies (Fuster-MĂĄrquez & Pennock-Speck, 2015;
Grabowski, 2015; Gray & Biber, 2013; Römer, 2010). However, our preliminary analysis showed
that the majority of the 4-word frames were incomplete units or contained function words only
(e.g. the * of the). The pedagogical relevance of such units appeared questionable. We therefore
decided to focus on five- and six-word frames only. The increased length allowed us to identify p-
frames that are semantically more complete and more specific to the part-genre of RA
introductions.
14. ACCEPTED MANUSCRIPT
12
Another important methodological consideration was to determine the optimal combination of
cut-off points of p-frame frequency, the number of variants, and the number of texts and disciplines
in which a p-frame appears. There exists a certain degree of arbitrariness among previous studies
in threshold setting. For example, depending on corpus size, researchers have set the frequency
threshold from 10 occurrences per million words (PMW) (e.g. Simpson-Vlach & Ellis 2010) to 40
occurrences (e.g. Biber et al. 2004). To establish the ideal cut-off points appropriate for the size of
our corpus, we conducted an explorative investigation using a range of threshold combinations.
As the result of this investigation, we settled on five-word p-frames with at least 16 occurrences
PMW and six-word p-frames with at least 12 occurrences PMW; additionally, each p-frame should
have two or more variants and should occur in three or more texts across two or more disciplines.
After all the necessary decisions were made, we extracted five- and six-word p-frames using
kfNgram (Fletcher, 2011). For each p-frame, kfNgram provides its token count, a list of its variants,
and the token count for each variant. The initial candidate p-frames extracted included all possible
p-frames with a single variable slot in any position. However, we decided to discard p-frames with
an initial variable slot as most of them crossed phrasal or clausal boundaries (e.g. * a growing body
of with variants of to, so, reasoning, etc.). With these p-frames excluded, the initial candidate list
included 594 five-word p-frames and 167 six-word p-frames.
2.2.2. Manual filtering
The candidate list required manual scrutiny to exclude p-frames that were not meaningful or
pedagogically relevant. This involved a considerable amount of concordance analysis using
Antconc 3.5.0. (Anthony, 2017), in which each p-frame was examined in its original contexts of
use and filtered using three criteria. First, frames which were linguistically incomplete (e.g. of the
15. ACCEPTED MANUSCRIPT
13
paper is *) or crossed clausal boundaries (e.g. organized as follows the * section) were excluded.
Second, frames which could be subsumed under larger frames were discarded. For example, the
article is * as was removed as it was part of the article is * as follows. Finally, frames which could
be better represented as a lexical bundle (e.g. on the one hand *, where the slot can be filled by
virtually any word) were also excluded. To avoid researcher bias, only items marked as âexcludeâ
by all researchers were excluded. This procedure resulted in a final list of 370 five-word p-frames
and 84 six-word p-frames.
2.2.3. Structural and functional classification
The final entries underwent structural and functional analyses based on Gray and Biberâs (2013)
structural taxonomy and Simpson-Vlach and Ellisâ (2010) functional taxonomy. It was our hope
that information on the structure and function of each p-frame would enhance the usefulness of
the list for EAP teachers and students.
Gray and Biber (2013) suggested three structural categories: (a) verb-based frames (frames
containing one or more verb, e.g. must be * to); (b) other-content-word frames (frames containing
one or more content words except verbs, e.g. on the * hand); and (c) function-word frames (frames
containing only function words, e.g. the * of this).
Simpson-Vlach and Ellisâ (2010) functional taxonomy, adapted from that of Biber et al.âs
(2004), also posits three primary categoriesâreferential, stance, and discourse expressionsâwith
several levels of sub-categories. We analyzed the frames based on the primary and second-level
categories only (see Section 3.2), as many of the more fine-grained functions were not applicable
to RA introductions. This classification required a substantial amount of concordance analysis. For
each occurrence of a p-frame, we determined its functional category based on the semantics of the
16. ACCEPTED MANUSCRIPT
14
variant and the context in which it occurred. This approach inevitably resulted in some p-frames
being identified as multi-functional. For example, in this * it is was labeled as both âreferentialâ
and âdiscourse-organizingâ, since it contained some variants referring to research context (e.g.
context and setting) and others to textual elements (e.g. essay and study).
While this variant-based approach to functional analysis has been commonly adopted by
previous researchers (e.g. Fuster-Mårquez & Pennock-Speck, 2015; Römer, 2010), some
researchers have raised concerns regarding its contingent nature. Grabowski (2015) proposed a
fixed-frame-based approach that assigns functional labels to p-frames based on âthe nature of their
fixed components rather than the semantics of slot-fillers and/or longer chunks of texts within a
given p-frameâ (p.271), arguing that p-frames and lexical bundles are distinctive constructs and p-
frames can be functionally analyzed independently from their textual realizations. Recognizing the
rationale and value of the fixed-frame-based approach, we see the variant-based approach as well
suited for the purposes of the current study, as the discourse functions of the textual realizations
of the p-frames would likely prove useful for helping EAP writers acquire contextually appropriate
uses of those variants.
2.2.4. Instructor and student writer review
Before finalizing the list, we solicited reviews of a subset of the list from two experienced
academic writing instructors and two student writers enrolled in the MA TESL program at a large
public university in the U.S. While previous studies that integrated rater assessment of the
pedagogical value of their lists all relied solely on expert or teacher perspectives (e.g. Ackerman
& Chen, 2013; Simpson-Vlach & Ellis, 2010), we considered it informative to include learner
perspectives as well to ensure the listâs potential usefulness to both EAP teachers and learners. For
17. ACCEPTED MANUSCRIPT
15
the review, the raters were provided with a random sample of 50 five-word p-frames and 50 six-
word p-frames; for each p-frame, they also received information on its frequency, the number of
variant types, the actual variants, and the number of texts and disciplines in which it occurred.
They were then asked to rate each p-frame using the following four-point Likert scale:
1= pattern not recognizable; frame not useful
2= pattern recognizable; frame not useful
3= pattern recognizable; frame somewhat useful
4= pattern recognizable; frame very useful
In our view, the pedagogical value of the p-frames on the list will likely vary substantially
depending on the level of experience and expertise of the academic writer. As such, rather than
using the ratings of a small group of instructors and student writers to include and exclude specific
p-frames, we were more interested in obtaining a preliminary sense of the proportion of the p-
frames on the list that may be pedagogically useful to either EAP instructors or learners, or both.
Thus, in analyzing the results of the ratings, we considered a p-frame to be useful if it earned a
total score of 5 (or an average score of 2.5) from either the instructor group or the learner group.
We note that our criterion of 5 points out of a maximum of 8 was more stringent than the criterion
of 9 points out of a maximum of 24 adopted by Ackerman and Chen (2013) in rating the
pedagogical usefulness of academic collocations. In total, 91 p-frames (91%) received a total score
of 5 or more by either one or both groups, indicating that the overwhelming majority of the p-
frames on the list may be considered useful by either EAP instructors or learners, or both. The p-
frames that were scored under 5 by both groups were mostly bundle-like frames (e.g. play an
18. ACCEPTED MANUSCRIPT
16
important role in *, with somewhat incoherent variants such as coordinating, modifying, etc.). This
observation was also supported by comments from the reviewers. For example, one teacher
reviewer noted that âI think play an important role is good, but question the role of the p-frame in
helping students develop their ideas further.â
3. Results
This section presents the results of the structural and functional analysis of the final, filtered
list of p-frames extracted from the corpus of social science RA introductions. The first subsection
details the structural categorization of the p-frames based on Gray and Biberâs (2013) taxonomy,
and the second the functional categorization based on Simpson-Vlach and Ellisâ (2010) taxonomy.
More emphasis is placed on the functional analysis, given its greater importance in pedagogy. Our
analysis shows clear differences in both structure and function between five-word and six-word p-
frames.
3.1. Structural categorization
Table 2 summarizes the distribution of five-word and six-word p-frames by structure. The
majority of five-word p-frames are other-content-word based (64.3%, n = 238), followed by verb-
based frames (28.9%, n = 107). Only a small proportion of five-word p-frames consist entirely of
function words (6.8%, n = 25), partly because many function-word frames were found to be part
of a six-word p-frame and therefore removed in the manual filtering stage. For six-word p-frames,
verb-based frames account for a larger proportion (54.8%, n = 46) than other-content-word frames
(45.2%, n = 38). No six-word p-frame consists of function words only. The differences in the
19. ACCEPTED MANUSCRIPT
17
distribution of five-word and six-word p-frames may not be surprising, given the increased
likelihood to encounter verbs or other content words in longer sequences.
Table 2
Distribution of the p-frames by structure.
Length Verb-based frames Other-content-word frames Function-word frames Total
Five-word 107 (28.9%) 238 (64.3%) 25 (6.8%) 370
Six-word 46 (54.8%) 38 (45.2%) 0 (0.0%) 84
All 153 (33.7%) 276 (60.8%) 25 (5.5%) 454
Some examples of p-frames in each structural category are presented below. The words in
square brackets indicate the variants that fill the open slot in each frame.
a. Verb-based frames:
we find [little, no, strong, suggestive, weak] evidence that
the [aim, purpose, goal, objective] of this article is
b. Other-content-word frames:
a brief [account, description, overview, reminder, review] of the
in the present study we [investigated, examine(d), focus, test(ed)]
c. Function-word frames:
one of the most [basic, common, fundamental, important, prevalent, significant]
the [degree, extent, height, spread] to which the
20. ACCEPTED MANUSCRIPT
18
3.2. Functional categorization
Table 3 summarizes the distribution of five-word and six-word p-frames by primary function.
For five-word p-frames, referential frames make up the largest category (55.1%, n = 204), followed
by stance frames (19.5%, n = 72) and discourse organizing frames (18.9%, n = 70). For six-word
p-frames, however, discourse organizing frames account for the largest proportion (64.3%, n =
54), followed by referential frames (15.5%, n = 13) and stance frames (13.1%, n = 11). A small
proportion of five-word (6.5%, n = 24) and six-word (7.1%, n = 6) p-frames were found to be
multifunctional, with their functions vary depending on the variants.
Table 3
Distribution of the p-frames by primary function.
Length Referential Stance Discourse Multifunction Total
Five-word 204 (55.1%) 70 (18.9%) 70 (18.9%) 26 (7.0%) 370
Six-word 13 (15.5%) 11 (13.1%) 54 (64.3%) 6 (7.1%) 84
All 217 (47.8%) 81 (17.8%) 124 (27.3%) 32 (7.0%) 454
In the rest of this section, we present some examples of p-frames in different primary and
second-level functional categories and discuss how they are used in context in social science RA
introductions. Entries with substantial overlap in terms of structure, function, and variants are
collapsed to capture their commonality, when doing so does not lose important details about the
structure, function, and variants of individual p-frames. For example, the p-frames the aim of this
*, the purpose of this *, the aim of the *, and the purpose of the * are collapsed into the aim/purpose
of this/the *. Due to space constraint, only one p-frame along with some of its most frequent
21. ACCEPTED MANUSCRIPT
19
variants is provided to illustrate each functional category. The full list of p-frames and their
complete variants are provided in Appendix A.
3.2.1. Referential p-frames
As shown in Table 3, overall, referential p-frames make up the largest category. Table 4
summarizes the proportions of p-frames in the five subcategories of referential p-frames in
Simpson-Vlach and Ellisâ (2010) taxonomy. We did not find frames functioning as vagueness
markers (i.e. phrases indicating imprecise reference, e.g. and so on) in our data. The largest
functional subcategory for all frames was specification of attributes. A p-frame in this subcategory
identifies specific attributes of a following nominal or clause, as illustrated in a.1.
Table 4
Subcategories of referential p-frames.
Length Specification of
Attributes
Identification
and Focus
Contrast and
Comparison
Deictics
and
Locatives
Vagueness
Markers
Total
Five-
word
164 (80.4%) 18 (8.8%) 10 (4.9%) 12 (5.9%) 0 (0.0%) 204
Six-
word
10 (76.9%) 0 (0.0%) 2 (15.4%) 1 (7.7%) 0 (0.0%) 13
All 174 (80.2%) 18 (8.3%) 12 (5.5%) 13 (6.0%) 0 (0.0%) 217
a. Referential p-frames
a.1. Specification of attributes, e.g. the presence or absence of [data, information, feature]
Ex. 1. In both experiments, we manipulate the presence or absence of [information]
intended to trigger âŠ
22. ACCEPTED MANUSCRIPT
20
Ex. 2. Moreover, our reliance on the presence or absence of [data] from a long-running
data series provides greater coverage âŠ
Identification and focus was the second largest subcategory of referential expressions. In RA
introductions identification and focus frames either introduce the focus of previous literature or
establish the focus area of oneâs own study, as illustrated in a.2.
a.2. Identification and focus, e.g. focus(-ing, -ed, -es) on the [consequences, effect(s), efficacy,
impact, implications] of
Ex. 3. In contrast, I focus on the [effects] of liquidity constraints on the extensive
margin âŠ
Ex. 4. Studies of domestic courts usually focus on the [role] of courts in serving as deciders
of contentious issues.
Contrast and comparison frames are relatively small in number. Many frames in this category
are used to introduce oneâs own research in relation to previous literature, sometimes highlighting
its unique focus, as illustrated in a.3.
a.3.Contrast and comparison, e.g. is [also, clearly, closely, inherently, positively] related to the
Ex. 5. This article is [also] related to the literature on savings, growth, and investment.
Ex. 6. My policy analysis is [closely] related to the personnel economics literature on
incentive contracts ...
23. ACCEPTED MANUSCRIPT
21
Deictic and locative frames are also small in number. Such p-frames are often used to provide
contextual information of oneâs research site or to contextualize oneâs own research in a specific
time period or location relative to previous research, as illustrated in a.4.
a.4. Deictic and locative, e.g. at the [beginning, end, start, time] of the
Ex. 7. Notably, the turn to the corporeal at the [end] of the twentieth century has had a
salutary effect âŠ
Ex. 8. I find that Christian and Islamic communities had, at the [time] of the survey, the
most positive impact on respect for religious freedom in Ibadan ...
3.2.2. Stance p-frames
Stance expressions provide a means for conveying oneâs attitude, perspective, or position
toward an event, action, or a proposition. Simpson-Vlach and Ellis (2010) suggested six
subcategories of stance expressions, namely, hedges, epistemic stance, expressions of ability and
possibility, evaluation, obligation and directive, and intention/volition and prediction. However,
we did not find frames in the last two subcategories in our data (Table 5).
Table 5
Subcategories of stance p-frames.
Length Hedges Epistemic Ability Evaluation Obligation Intention Total
Five-word 13 (18.6%) 19 (27.1%) 3 (4.3%) 35 (50.0%) 0 (0.0% ) 0 (0.0% ) 70
Six-word 1 (9.1%) 6 (54.5%) 0 (0.0%) 4 (36.4%) 0 (0.0% ) 0 (0.0% ) 11
All 14 (17.3%) 25 (30.9%) 3 (3.7%) 39 (48.1%) 0 (0.0%) 0 (0.0% ) 81
24. ACCEPTED MANUSCRIPT
22
Expressions in the hedges subcategory are known to play a crucial role in academic writing as
they allow writers to express uncertainty regarding the truth value of their statements, enabling
them not only to show modesty and reservation but also to avoid personal accountability (Hyland,
1994). In our data, hedges were expressed often through adjectives introduced by the copula be,
as illustrated in b.1.
b. Stance p-frames
b.1. Hedges, e.g. are [less, relatively, more, not, also] likely to be
Ex. 9. Instead, we have shown that as oil wealth rises, autocracies are [less] likely to be
ousted by groups that would initiate new dictatorshipsâŠ
Ex. 10. Most importantly, empirical research has repeatedly shown that evangelical
Protestants are [relatively] likely to be lower class.
Epistemic stance frames are somewhat similar to hedges in that they also include expressions
of uncertainty. However, such frames have more to do with âknowledge claims or demonstrationsâ
and âreports of claims by othersâ (Simpson-Vlach & Ellis, 2010, p.506), as illustrated in b.2.
b.2. Epistemic stance, e.g. may or may not be [useful, protective]
Ex. 11. ... they may instead do so in their first language, which may or may not be [useful]
in helping them develop literacy in their L2.
Ex. 12. However, perceived control may or may not be [protective] against mortality âŠ
25. ACCEPTED MANUSCRIPT
23
The ability and possibility frames express or introduce some possible action or proposition. In
social science RA introductions, the ability and possibility frames are often used to justify or
rationalize research focus or design, as illustrated in b.3.
b.3. Ability and possibility, e.g. allows us to [address, assess, explore, investigate, measure,
observe, study] the
Ex. 13. The time-series dimension allows us to [address] the potential endogeneity of
network ties.
Ex. 14. This allows us to [explore] the relationship between non-standard work hours and
fertility decisions from different perspectives âŠ
The evaluation category formed the largest group of stance p-frames. P-frames in this category
are often used to evaluate oneâs own or othersâ research through evaluative adjectives, as illustrated
in b.4.
b.4. Evaluation, e.g. it is important to [note, emphasize, underscore, acknowledge] that
Ex. 15. It is important to [note] that turning points may vary in valence (negative or
positive), severity, and duration across individuals.
Ex. 16. Likewise, it is important to [emphasize] that our experiment only studies a small
sampling of the many decision environmentsâŠ
3.2.3 Discourse organizing frames
26. ACCEPTED MANUSCRIPT
24
Discourse organizing frames, the second largest group in our list, served four main functions
following Simpson-Vlach and Ellisâ (2010) taxonomy: metadiscourse, topic introduction, topic
elaboration, and discourse markers. The first subcategory, metadiscourse and textual reference
includes frames that seem to be genre-specific, signaling the outline of the article, as illustrated in
c.1.
Table 6
Subcategories of discourse organizing p-frames.
Length Metadiscourse Topic
introduction
Topic
elaboration
Discourse
markers
Total
Five-word 33 (47.1%) 13 (18.6%) 23 (32.9%) 1 (1.4%) 70
Six-word 32 (59.3%) 20 (37.0%) 1 (1.9%) 1 (1.9%) 54
All 65 (52.4%) 33 (26.6%) 24 (19.4%) 2 (1.6%) 124
c. Discourse organizing frames
c.1. Metadiscourse, e.g. the article/paper is [organized, structured] as follows
Ex. 17. The article is [structured] as follows: first, the literature on learnersâ cognitive
processes in L2 pragmatics research is reviewed âŠ
Ex. 18. The paper is [organized] as follows: Section 2 presents the social choice
environment.
The subcategory of topic introduction and focus signals the topic or the goal of the research.
This category, as Simpson-Vlach and Ellis (2010) noted, functionally overlaps with the
identification and focus category under referential expressions to some degree. The main
27. ACCEPTED MANUSCRIPT
25
difference between the two is that the topic introduction and focus frames serve more âglobal
discourse organizing function of introducing a topic,â as illustrated in c.2, whereas the
identification and focus frames have more to do with âlocal referential function of identificationâ
(p.507).
c.2. Topic introduction and focus, e.g. the primary [purpose, goal, aim, objective, contribution]
of this study/article/paper
Ex. 19. The primary [purpose] of this study was to classify the regime types for twenty-
four countries in the Americas ...
Ex. 20. Accordingly, the primary [goal] of this study is to test and extend the
metatheoretical framework proposed by Ferris and colleagues âŠ
The topic elaboration subcategory relates to explicating and elaborating a topic previously
introduced. Many frames in this category include phrases signaling a cause/reason and effect
relationship, as illustrated in c.3.
c.3. Topic elaboration, e.g. to [assess, estimate, evaluate, examine, explore, measure, study,
test] the effect(s) of
Ex. 21. Hypotheses are developed to [evaluate] the effects of item positioning on response
behavior under the three mechanisms.
Ex. 22. The test was designed to [estimate] the effect of paid search on sales ...
28. ACCEPTED MANUSCRIPT
26
Discourse markers generally serve to connect ideas smoothly and logically. Our list only
contains a few p-frames functioning as discourse markers (e.g. c.4), partially because many of
them were considered to be better represented as lexical bundles rather than p-frames (e.g. on the
other hand *) and thus removed at the manual filtering stage.
c.4. Discourse markers, e.g. in addition to the [literature, amount, outcome, above-mentioned]
Ex. 23. In addition to the [literature] on contract theory and mechanism design with limited
commitment, our analysis is related to two other strands of literature ...
Ex. 24. In addition to the [above-mentioned] reasons, an early start of FL learning has
been uncritically accompanied by expectations of superior L2 outcomes ...
The functional categories of the p-frames discussed above are not meant to be taken as
definitive and exclusive, as some have multiple functions, but rather as indications of the most
salient function they tend to fulfill in social science RA introductions.
4. Discussion
The purpose of this study was to derive a pedagogically useful list of p-frames from a corpus
of a specific part-genre, i.e. research article introductions, in six social science disciplines. This
research aim was motivated by insights into the importance of formulaic sequences in academic
English as well as the variation in formulaic language use across different registers, genres, and
disciplines (e.g. Biber et al., 2004; Cortes, 2013; Coxhead & Byrd, 2007; Cunningham, 2017;
Grabowski, 2015; Hyland, 2008), success of previous corpus-based efforts in compiling lists of
academic formulaic expressions (e.g. Simpson-Vlach & Ellis, 2010; Martinez & Schmitt, 2012)
29. ACCEPTED MANUSCRIPT
27
and the absence of academic p-frame lists, and the perceived value of p-frame lists for EAP writing
pedagogy (Cunningham, 2017; Fletcher, 2006, 2011; Gray & Biber, 2013). The study thus serves
to fill an important gap in academic formulaic language research.
Recognizing the limitations of the bundles-to-frame approach to p-frame extraction (e.g.
Cunningham, 2017; Römer, 2010), we argued for and adopted the fully inductive approach in
which p-frames are identified based on all continuous lexical sequences, rather than just lexical
bundles, found in the corpus (Fuster-MĂĄrquez & Pennock-Speck, 2015; Grabowski, 2015; Gray &
Biber, 2013). A combination of corpus statistics was used to extract an initial set of p-frame
candidates with adequate frequency, variant diversity, and range across disciplines using kfNgram
(Fletcher, 2011). These candidates were then manually filtered in several steps to ensure their
semantic completeness and pedagogical value. The resulting 370 five-word p-frames and 84 six-
word p-frames were analyzed using Gray and Biberâs (2013) structural taxonomy and Simpson-
Vlach and Ellisâ (2010) functional taxonomy. Overall, the majority of p-frames (60.8%) were
other-content-word (excluding verbs) frames. While Cunningham (2017) used a different
taxonomy (Biber et al., 1999) for structural analysis, it is clear that the majority of the p-frames
identified from the corpus of mathematics research articles were verb-based. The functional
analysis was performed using a variant-based approach (e.g. Fuster-MĂĄrquez & Pennock-Speck,
2015; Römer, 2010), which examines the functions of specific realizations of each p-frame in
context, rather than a fixed-frame-based approach (Grabowski, 2015), which determines the
function of each p-frame based on its fixed components. This was the case based on the
consideration that the functional categories assigned to different variants of the p-frames in context
may prove useful in helping EAP learners acquire contextually appropriate uses of the p-frames
and their variants. The variant-based approach resulted in a subset of p-frames being categorized
30. ACCEPTED MANUSCRIPT
28
in more than one functional category. Overall, referential p-frames accounted for the largest
proportion (47.8%), but the majority of six-word p-frames (64.3%) were discourse organizing
frames. The final list is organized by function and presented in Appendix A, with multifunctional
p-frames listed under separate categories with the corresponding variants.
Evaluation of a random sample of 100 p-frames by a panel of academic writing instructors and
student writers indicated that the overwhelming majority (91%) of the p-frames on the list were
considered pedagogically useful to either the instructors or student writers, or both. While most
previous studies solicited review by experts only, such as instructors, testers, publishers, and
lexicographers (e.g. Ackerman & Chen, 2013; Simpson-Vlach & Ellis, 2010), we considered it
useful to include student or learner perspectives as well, as ultimately the list is intended to serve
their learning needs. Previous studies also commonly used the review by a large panel of experts
to eliminate candidate expressions that had already been filtered by the researchers (e.g. Ackerman
& Chen, 2013; Martinez and Schmitt, 2012). While we acknowledge the benefits of having a larger
panel of reviewers with diverse academic backgrounds, we also deem it difficult to adequately
represent the pedagogical needs of all academic writers, given the many criteria that should be
considered (e.g. L1 background, level of experience and expertise in academic writing, discipline,
etc.). As such, the use of any panelâs judgment as absolute criteria to include or exclude candidate
p-frames does not necessarily constitute the most optimal solution, as confirmed by our panel
members. It may indeed be more productive to leave room for EAP teachers and learners to make
their own judgment based on their specific pedagogical contexts and learning needs.
Some overlap exists between p-frames and other types of academic expressions. For example,
complete variants of the p-frames, when examined individually, are reminiscent of formulas and
phrases, and the fillers of the p-frames may remind one of collocations. However, different from
31. ACCEPTED MANUSCRIPT
29
individual formulas and phrases, p-frames provide information about patterns and their variability,
and different from collocations presented in isolation, p-frames contextualize co-occurring words
in syntactic patterns. As such, the p-frame list constitutes a useful addition to existing lists of
academic vocabulary (Coxhead, 2000; Gardner & Davies, 2013), collocations (Ackerman & Chen,
2013), and continuous formulaic sequences (Martinez and Schmitt, 2012; Morley, 2015; Simpson-
Vlach & Ellis, 2010) reviewed earlier.
5. Conclusion
The current study has extracted, analyzed, and evaluated a pedagogically useful list of p-frames
from a corpus of social science research article introductions. Focusing on six social science
disciplines, this study sits somewhere in the middle in the discipline specificity continuum. While
we did not focus on inter-disciplinary variation, the identification of p-frames that are uniquely
useful in specific disciplines would certainly constitute a productive avenue of future research (e.g.
Cunningham, 2017). Additionally, although the p-frames identified in our study all occur in at
least two disciplines, the p-frame variants may be analyzed in terms of their specificity to
individual disciplines (e.g. Fuster-MĂĄrquez, 2014; Grabowski, 2015).
The focus on the specific part-genre of RA introductions may be both a limitation and strength.
On the one hand, it limits the pedagogical value of the list compiled primarily to this part-genre.
On the other hand, awareness of linguistic variation across genres and part-genres is critical to the
development of EAP learnersâ genre competence, and resources targeting high-stake genres and
part-genres such as RA introductions will prove valuable for pedagogy aimed at promoting that
awareness (Cortes, 2013; Hyland, 2007). The research outcome generated in this study also paves
32. ACCEPTED MANUSCRIPT
30
the way for our ongoing research on aligning p-frames with rhetorical moves and steps in RA
introductions, and on identifying p-frames for other RA part-genres.
The pedagogical applications of the p-frame list compiled need to be carefully considered. For
EAP courses that take a genre approach to teaching academic writing, the p-frame list can serve
as a useful resource for assisting studentsâ analysis of language features that characterize RAs.
However, in our view lists of different types of academic expressions should best be used in an
integrative way to maximize their potential for promoting studentsâ genre competence. For
example, as students identify important collocations in RAs, the p-frame list can be a handy tool
to help them see the range of contexts or syntactic environments in which they occur in RAs.
Similarly, as students notice formulas that are frequently used in RAs, the p-frame list can help
them see patterns that such formulas fit in as well as related variants that they could use. We also
expect the list to serve as one of many useful reference tools to novice social science scholars as
they engage in RA writing. Our future research will investigate the pedagogical uses of this list in
genre-based academic writing classrooms and actual RA writing contexts. We also call for more
empirical research examining the feasibility and effectiveness of integrative pedagogical
applications of the different types of academic formulaic expression lists to validate existing lists,
identify best practices in using them, and inform future efforts in compiling new lists.
Appendix A. The complete phrase-frame list for social science research article introductions
The complete list of p-frames can be found at
http://www.personal.psu.edu/xxl13/download.html.
References
33. ACCEPTED MANUSCRIPT
31
Ackerman, K., & Chen, Y-H. (2013). Developing the Academic Collocation List (ACL) â A
corpus-driven and expert-judged approach. Journal of English for Academic Purposes, 12,
235-247.
Anthony, L. (2017). AntConc (Version 3.5.0) [Computer Software]. Tokyo, Japan: Waseda
University. Available from http://www.laurenceanthony.net/software
Biber, D. (2009). A corpus-driven approach to formulaic language in English. International
Journal of Corpus Linguistics, 14, 275-311.
Biber, D., & S. Conrad. (2009). Register, genre, and style. Cambridge: Cambridge University
Press.
Biber, D., Conrad., & Cortes, V. (2004). If you look at ...: Lexical bundles in university teaching
and textbooks. Applied Linguistics, 25, 371-405.
Biber, D., Leech, G., Johansson, S., Conrad, S., & Finegan, E. (1999). Longman grammar of
spoken and written English. London: Longman.
Cortes, V. (2013). The purpose of this study is to: Connecting lexical bundles and moves in
research article introductions. Journal of English for Academic Purposes, 12, 33-43.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213-238.
Coxhead, A., & Byrd, P. (2007). Preparing writing teachers to teach the vocabulary and grammar
of academic prose. Journal of Second Language Writing, 16, 129-147.
Cunningham, K. J. (2017). A phraseological exploration of recent mathematics research articles
through key phrase frames. Journal of English for Academic Purposes, 25, 71-83.
Fletcher, W. H. (2006). âPhrases in Englishâ Home. Available from http://phrasesinenglish.org/
Fletcher, W. H. (2011). KfNgram. Annapolis, MD: USNA.
Fuster-MĂĄrquez, M. (2014). Lexical bundles and phrase frames in the language of hotel websites.
34. ACCEPTED MANUSCRIPT
32
English Text Construction, 7, 84-121.
Fuster-MĂĄrquez, M., & Pennock-Speck, B. (2015). Target frames in British hotel
websites. International Journal of English Studies, 15, 51-69.
Gardner, D., & Davies, M. (2013). A new academic vocabulary list. Applied Linguistics, 35(3),
305-327.
Grabowski, Ć. (2015). Phrase frames in English pharmaceutical discourse: A corpus-driven study
of intradisciplinary register variation. Research in Language, 13, 266-291.
Gray, B., & Biber, D. (2013). Lexical frames in academic prose and conversation. International
Journal of Corpus Linguistics, 18, 109-136.
Herbst, T. (2011). Choosing sandy beaches â Collocations, probabemes and the idiom principle.
In T. Herbst, S. Faulhaber, & P. Uhrig (Eds.), The phraseological view of language (pp. 27-
57). Berlin: Walter de Gruyter.
Hunston, S. (2002). Corpora in applied linguistics. Cambridge: Cambridge University Press.
Hyland, K. (1994). Hedging in academic writing and EAP textbooks. English for Specific
Purposes, 13, 239-56.
Hyland, K. (2007). Genre pedagogy: Language, literacy and L2 writing instruction. Journal of
Second Language Writing, 16, 148-164.
Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific
Purposes, 27(1), 4-21.
Johansson, S. (2011). Corpus, lexis, discourse: a tribute to John Sinclair. In T. Herbst, S. Faulhaber,
& P. Uhrig (Eds.), The phraseological view of language, (pp. 7-26). Berlin: Walter de Gruyter.
Martinez, R., & Schmitt, N. (2012). A phrasal expressions list. Applied Linguistics, 33, 299-320.
Morley, J. (2015). The Academic Phrasebank: An academic writing resource for students and
35. ACCEPTED MANUSCRIPT
33
researchers. Manchester, UK: The University of Manchester.
Nattinger, J. R., & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford:
Oxford University Press.
Paltridge, B. (2004). Academic writing. Language Teaching, 37, 87-105.
Römer, U. (2010). Establishing the phraseological profile of a text type: The construction of
meaning in academic book reviews. English Text Construction, 3, 95-119.
Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list: New methods in phraseology
research. Applied linguistics, 31, 487-512.
Sinclair, J. McH. (1991). Corpus concordance collocation. Oxford: Oxford University Press.
Sinclair, J. McH., & Carter, R. (Eds.) (2004). Trust the text: Language, corpus and discourse.
London/New York: Routledge.
Stubbs, M. (2009). Technology and phraseology: With notes on the history of corpus linguistics.
In U. Römer & R. Schulze (Eds.), Exploring the lexis-grammar interface, (pp. 15-31).
Amsterdam/Philadelphia: John Benjamins.
Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford: Oxford University Press.
36. ACCEPTED MANUSCRIPT
Xiaofei Lu is Associate Professor of Applied Linguistics and Asian Studies at The
Pennsylvania State University, where he directs the graduate programs in the Department
of Applied Linguistics. His research interests are primarily in corpus linguistics, intelligent
computer-assisted language learning, English for Academic Purposes, and second
language writing. He is the author of Computational Methods for Corpus Annotation and
Analysis (2014, Springer).
Jungwan Yoon is a Ph.D. candidate in the Department of Applied Linguistics at The
Pennsylvania State University. Her research interests include academic literacy
development, second language writing, corpus linguistics, and discourse analysis.
Olesya Kisselev is a Ph.D. candidate in the Department of Applied Linguistics at The
Pennsylvania State University. Before coming to Penn State, she was an instructor and
curriculum developer in the Russian Flagship Program at Portland State University. Her
research interests include corpus linguistics and discourse analysis, especially as they apply
to the study of various aspects of second language and heritage language acquisition.