SlideShare a Scribd company logo
1 of 36
Download to read offline
Accepted Manuscript
A phrase-frame list for social science research article introductions
Xiaofei Lu, Jungwan Yoon, Olesya Kisselev
PII: S1475-1585(18)30115-2
DOI: 10.1016/j.jeap.2018.09.004
Reference: JEAP 692
To appear in: Journal of English for Academic Purposes
Received Date: 10 March 2018
Accepted Date: 14 September 2018
Please cite this article as: Xiaofei Lu, Jungwan Yoon, Olesya Kisselev, A phrase-frame list for
social science research article introductions, (2018), doi:
Journal of English for Academic Purposes
10.1016/j.jeap.2018.09.004
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form.
Please note that during the production process errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
A phrase-frame list for social science research article introductions
Xiaofei Lu a,*, Jungwan Yoon a, Olesya Kisselev a
a Department of Applied Linguistics, The Pennsylvania State University, 234 Sparks
Building, University Park, PA 16802, USA
*Corresponding author. Tel.: +1 (814) 865-4692
E-mail addresses: xxl13@psu.edu (X. Lu), jxy204@psu.edu (J. Yoon), ovk103@psu.edu
(O. Kisselev).
ACCEPTED MANUSCRIPT
A phrase-frame list for social science research article introductions
Abstract
This study aimed to contribute to recent corpus-based efforts in compiling lists of academic
expressions by deriving a pedagogically useful list of phrase-frames for a specific part-genre, i.e.
research article introductions, in six social science disciplines. A combination of corpus statistics
was used to extract an initial set of phrase-frame candidates with adequate frequency, variant
diversity, and range across disciplines. These candidates were then manually filtered in several
steps to ensure their semantic completeness and pedagogical value. The resulting 370 five-word
phrase-frames and 84 six-word phrase-frames were analyzed structurally and functionally.
Evaluation of a random sample of 100 phrase-frames by a panel of academic writing instructors
and student writers indicated that the overwhelming majority of the phrase-frames were considered
pedagogically useful by either the instructors or the student writers, or both. The implications of
the current study for academic formulaic language research and of the phrase-frame list compiled
for academic writing pedagogy are considered.
Keywords: Academic writing; Formulaic language; Phrase-frames; Research article introductions
1. Introduction
In the past decades, many corpus-based studies have argued for the necessity of lists of
academic formulaic expressions for students and teachers of English for Academic Purposes
(EAP), explored the methodological issues involved in compiling such lists, and presented several
pedagogically useful lists of different types of academic expressions (e.g. Ackerman & Chen, 2013;
Biber et al., 2004; Martinez & Schmitt, 2012; Morley, 2015; Nattinger & DeCarrico, 1992;
ACCEPTED MANUSCRIPT
2
Simpson-Vlach & Ellis, 2010). The EAP community has shown substantial interest in such lists
and recognized the importance of their continued improvement and enrichment. One type of
formulaic expression that is now increasingly seen as pedagogically relevant but that has not yet
been systematically tackled in previous corpus-based endeavors is phrase-frames (hereafter p-
frames), i.e. semi-fixed sequences that contain a variable slot that can be filled by different words
e.g. the * of the study, where the open slot may be filled by aim, goal, and purpose, among others.
In this study, we extend recent efforts in compiling pedagogically useful lists of academic
formulaic expressions by deriving a list of p-frames frequently used in a corpus of a specific part-
genre, i.e. research article (RA) introductions, in six social science disciplines. In doing so, we
hope to contribute to the methodological discussion of the extraction and selection of candidate p-
frames as well as the usefulness and feasibility of functional categorization of the p-frames for
pedagogical applications.
1.1. Language as phraseology
A growing body of research accumulated in the past two decades in corpus linguistics has
significantly contributed to the contemporary understanding of “language as phraseology”
(Hunston, 2002, p.137). Language as phraseology is an assumption that positions the phrase, not
individual words, as a fundamental unit of meaning. This assumption has a long history in the field
of language pedagogy: language teachers were, possibly, the first to grapple with practical issues
of formulaicity in language (Herbst, 2011; Sinclair & Carter, 2004; Stubbs, 2009). Indeed, any
attempt by the learner to achieve a degree of native-like language ability inevitably results in
realization that there exist multiple conventionalized ways of stringing words together in a
particular language that are unpredictable based on the traditionally understood “rules of grammar”.
ACCEPTED MANUSCRIPT
3
The need to identify and, subsequently, teach conventionalized sequences was recognized as early
as the 1930s. In fact, the coinage of the term collocation belongs to the English language teacher
and researcher Harold E. Palmer. Palmer’s definition of a collocation as “a succession of two or
more words that must be learnt as an integral whole and not pieced together from its component
parts” (Palmer, 1933 as cited in Stubbs, 2009, p.17) was effectively adopted by John Sinclair, who
took the term and the idea behind it to build a whole new field around it (Herbst, 2011; Johansson,
2011; Stubbs, 2009).
The original notion of language as phraseology can be summarized along the lines of the Idiom
Principle, which posits that “a language user has available to him or her a large number of semi-
preconstructed phrases that constitute single choices, even though they might appear to be
analyzable into segments” (Sinclair, 1991, p.110). Corpus linguistics has provided evidence that
formulaicity is omnipresent in language, and that formulaic sequences are fundamental to the way
language is stored, processed, acquired and used (Hunston, 2002; Wray, 2008). Suggesting that
every conventionalized recurrent communicative function has a conventionalized linguistic form,
corpus linguistics has also effectively created a repository of research that showed which and what
types of multi-word sequences, or formulae, tend to appear in which or what types of language
modalities, registers, and genres1 (e.g. Biber et al., 2004; Hyland, 2008; Simpson-Vlach & Ellis,
2010). Biber et al. (2004) reported systematic differences in the distribution of lexical bundles of
different structures and functions across the registers of conversation, classroom teaching,
textbooks, and academic prose. Along similar lines, Simpson-Vlach and Ellis (2010) found that
while some formulas occur frequently in both academic speaking and writing, many formulas
1 Registers are textual varieties “associated with a particular situation of use (including particular communicative
purposes)” (Biber & Conrad, 2009, p.6); examples include conversation, classroom teaching, and textbooks. Genres
are “abstract, socially recognised ways of using language” (Hyland, 2007, p. 149); examples include research articles,
book reviews, and conference abstracts.
ACCEPTED MANUSCRIPT
4
occur primarily in one or the other. Based on these findings on mode and register variation in
formulaic language use, many recent studies set out to analyze formulaic sequences in specific
academic genres or part-genres, some with attention to inter-disciplinary variation. Hyland (2008),
for example, analyzed a corpus of academic writing that consisted of research articles, dissertations,
and theses, and systematically delineated variation in lexical bundle use among four disciplines.
The understanding of language as phraseology and the insights from research into formulaic
language use in different modalities, registers, and genres have impacted thinking on the teaching
of academic language. In particular, it has been argued that incorporating analyses of formulaic
sequences and their functions in specific modalities, registers, and genres of texts may improve
the EAP writing curricula and learning outcomes (Coxhead & Byrd, 2007; Paltridge, 2004).
1.2. Pedagogically oriented lists of academic formulaic expressions
Recognizing the value of lists of academic formulaic expressions for facilitating EAP learners’
analyses and acquisition of formulaic sequences, corpus linguistics has made systematic efforts in
deriving various types of such lists from corpora of academic language, such as the Academic
Formulas List (AFL; Simpson-Vlach & Ellis, 2010) and the Phrasal Expressions List (PHRASE
List; Martinez & Schmitt, 2012). Comparable to the aims of lists of academic vocabulary (e.g.
Coxhead, 2000; Gardner & Davies, 2013), the AFL and the PHRASE List both aimed to represent
pedagogically relevant contiguous formulaic sequences (e.g. in the present study) to be
incorporated into curricula and teaching materials. Both lists were based on large corpora of
academic speech and writing, which included different types of texts such as research articles and
textbooks in the written portion, and lectures and seminars in the spoken portion. Methodologically,
ACCEPTED MANUSCRIPT
5
both teams approached the selection of possible formulas using a combination of quantitative
corpus measures and qualitative analytical procedures, albeit with differential emphasis on them.
Simpson-Vlach and Ellis (2010) set out to generate a “formula teaching worth” (FTW) score
for each formula, i.e. a composite score comprised of frequency and mutual information (MI) that
could predict human judgement of the teaching worth of the formula. To that end, the authors
recruited judges with teaching and testing experience to rate a random sample of 108 formulas,
using such criteria as “whether or not they thought the phrase constituted ‘a formulaic expression,
or fixed phrase, or chunk’ ” and “whether or not they thought the phrase was ‘worth teaching, as
a bona fide phrase or expression’ ” (p.496). They subsequently ran multiple regression analysis on
the rated sample to derive beta coefficients of frequency and MI as predictors of human rater scores
and then used those coefficients to generate FTW scores for all formulas. They claimed that the
final AFL list contained, in theory, only pedagogically useful formulaic expressions.
Martinez and Schmitt (2012) cast doubt on the validity of the measure, whereby only a small
subset of data was analyzed by human raters. They questioned such “strict adherence to statistically
derived phrase selection” (p.306) and suggested that subjective judgment by raters with solid
teaching and testing backgrounds should be included as a determinant for item inclusion. They
incorporated several core criteria for the manual selection of candidates for the PHRASE List,
based largely on Wray’s (2008) criteria for formulaicity. Examples of their criteria included “Is
the expression a Morpheme Equivalent Unit (MEU)?”, “Is the expression semantically
transparent?”, etc. These criteria were applied to all expressions extracted from the British National
Corpus. Martinez and Schmitt conceded that while such a methodology was “extremely time and
labor intensive,” the resulting list was “clearly enhanced pedagogically” (p.310).
ACCEPTED MANUSCRIPT
6
Cortes (2013) and Morley (2015) took the qualitative analytical dimension a step further by
aligning academic expressions with rhetorical functions. Both studies also focused on more
specific EAP genres or part-genres. Cortes (2013) extracted a list of lexical bundles from a corpus
of RA introductions and matched them to rhetorical moves and steps (e.g. one of the major was
matched to the step ‘claiming relevance of field’). Morley (2015) extracted a list of phrases from
a corpus of postgraduate dissertations and organized them by communicative function, e.g.
‘introducing problems and limitations’. He operationalized phrases broadly as expressions of
variable length that were deemed useful for a communicative function, including both bundle-style
items such as “The paper fails to specify 
” and rather long items such as “Difficulties arise,
however, when an attempt is made to implement the policy”.
Taken together, existing corpus-based efforts in compiling lists of academic expressions all
highlight the importance to consider both quantitative corpus statistics and qualitative analyses in
identifying pedagogically relevant expressions. Meanwhile, these efforts have focused primarily
on continuous formulaic expressions. In the current study, we expand this line of research by
arguing that lists of academic formulaic expressions can be enriched with the inclusion of academic
p-frames, a position which we explore in detail in the section below. While we leave the alignment
of p-frames with rhetorical functions to a future study2, we note that studies such as Cortes’ (2013)
and Morley’s (2015) hold much promise in providing EAP students with not simply a list of
academic expressions but a repository of linguistic units coupled with their specific rhetorical
functions.
2 The compilation of the p-frame list is a necessary first step toward matching p-frames to rhetorical functions, and 1)
the motivation and methodology for deriving the p-frame list and the structural and functional analysis of the p-frames
extracted and 2) the systematic alignment of p-frames with rhetorical functions both warrant in-depth discussion. We
thus defer the reporting of results regarding frame-function alignment to a follow-up study.
ACCEPTED MANUSCRIPT
7
1.3. Phrase-frames
In our view, lists of academic formulaic expressions will be greatly enhanced by the inclusion
of phrase-frames (Fletcher, 2006, 2011), i.e. multi-word sequences in which words form a ‘frame’
around a variable slot. The variants of a frame often form one or more semantically close or
functionally similar clusters. For example, the variable slot in the * of the study may be filled in
by purpose and goal as well as motivation and rationale. Many studies have underscored the
importance of p-frames in the academic discourse. Biber (2009), for instance, found that academic
writing relies heavily on discontinuous frames, while conversation relies more on continuous
lexical sequences. Gray and Biber (2013) further showed that p-frames are more variable in
academic writing than in conversation. These observations give us reason to believe that novice
EAP writers stand to benefit from pedagogical resources that provide not only continuous set
expressions such as on the one hand but also discontinuous frames and their variants. Teaching
variable frames may allow teachers to introduce more language while lessening the cognitive
demand on memory; at the same time, examining the variants that fill the variable slot may be a
valuable exercise in understanding the degree of formulaicity of various constructions.
To be fair, some of the lists reviewed above included some p-frames. For example, Simpson-
Vlach and Ellis (2010) listed a few p-frames on the AFL (e.g. [a/large/the] number of), which they
created by compounding multiple overlapping n-grams. Martinez and Schmitt (2012) also dealt
with expressions “with a variable component” on the ad hoc basis, but they went a step further:
when such an expression was noticed, they conducted “a careful follow-up search” in the corpus
to “identify all variable forms of that expression” (p.312). However, the limited number of p-
frames included in these lists were identified based on the procedure that extracts p-frames from
overlapping high-frequency n-grams (or lexical bundles). Such a procedure has been
ACCEPTED MANUSCRIPT
8
problematized by Gray and Biber (2013), who reported that numerous discontinuous sequences
were not associated with lexical bundles. The extraction of p-frames thus necessitates a separate
procedure from that for extracting lexical bundles.
In a recent study, Cunningham (2017) explored the use of p-frames in 128 mathematics RAs.
She identified 180 p-frames specific to the mathematics discipline using a combination of
frequency, range, and keyness criteria. These p-frames were then analyzed structurally and
functionally. In her methodological procedure, the minimum frequency of each p-frame variant
was set to three. This procedure suffered the same limitation noted by Gray and Biber (2013),
leaving less frequent p-frame variants unanalyzed and many meaningful p-frames with diverse but
less frequent variants unidentified. The study’s singular focus on the mathematics discipline also
calls for research on other disciplines.
The argument for general, non-disciplinary academic word and phrase lists is largely well
taken, since, after all, EAP courses do usually cater to students from diverse majors and
specializations. Nevertheless, linguistic variation across registers, genres, and disciplines is well
documented (Biber et. al, 1999; Hyland, 2008), and the need to “determine how meaning creation
works” in subsets of language (i.e. specific registers, genres, and disciplines) “that show a
specialized grammar and vocabulary” (Römer, 2010, p.308) has been called for. This need
prompted us to investigate p-frames in a particular EAP part-genre, i.e. RA introductions, within
a set of social science disciplines. Several studies reviewed above (e.g. Cortes, 2013; Cunningham,
2017; Morley, 2015) have generated valuable insights into formulaic language use in specific
academic genres and disciplines. In line with such insights and keeping in mind the pedagogical
imperative, we also hold the view that presenting lists of formulaic expressions for specific genres
ACCEPTED MANUSCRIPT
9
aids in fulfilling the promise of genre pedagogy, which sees “real benefits for learners as they pull
together language, content, and contexts” (Hyland, 2007, p.150).
1.4. Overview of the current study
This study aims to add to recent corpus-based efforts in compiling lists of academic
expressions by deriving a pedagogically useful list of p-frames for a specific part-genre, i.e. RA
introductions, in six social science disciplines. To this end, we first identified a set of p-frames
from a corpus of social science RA introductions and then subjected them to several manual
filtering procedures. All p-frames included in the final list were analyzed structurally and
functionally, and a subset was rated for pedagogical value by a panel of EAP writing instructors
and student writers. In what follows, we detail our methodological procedure, present the results
of the different steps of our procedure, and discuss the implications of our results for academic
formulaic language research and academic writing pedagogy.
2. Methodology
2.1. Corpus
The corpus used in the current study comprises 517,703 words of published RA introduction
sections sourced from the Corpus of Social Science Research Articles (COSSRA), developed by
our research team. COSSRA includes 600 RAs published in 2012-2016 in six social science
disciplines (Anthropology, Applied Linguistics, Economics, Political Science, Psychology, and
Sociology), with 100 RAs sampled from five journals in each discipline. The journals were
selected based on their impact factors, with their representativeness confirmed by two experts in
each discipline. For each journal, we first sampled 20 issues in the period 2012-2016 and
ACCEPTED MANUSCRIPT
10
subsequently sampled one RA per issue. The introduction sections of the RAs were extracted and
each saved as a plain text file. All files were manually checked for errors resulting from the
conversion process, and information unnecessary for p-frame identification were eliminated,
including parenthetical citations, footnotes, and mathematical formulas.
While many RAs contain an introduction section with the heading ‘Introduction’, some start
with an untitled section followed by a titled section. We included such untitled first sections in our
data as their primary rhetorical function was similar to the titled introduction sections. Table 1
summarizes the number of tokens of the RA introductions in each discipline. The subcorpus of
each discipline contained approximately comparable number of tokens, with the exception of
Economics, which had lengthier introductions than other disciplines. While it would be ideal to
compile a similar-sized subcorpus for each discipline, we opted to maintain an equal number of
texts for each discipline.
Table 1
Composition of the corpus of social science research article introductions.
Discipline Texts Tokens Proportion
Anthropology 100 76,263 14.7%
Applied Linguistics 100 63,357 12.2%
Economics 100 145,609 28.1%
Political Science 100 78,291 15.1%
Psychology 100 78,209 15.1%
Sociology 100 75,974 14.6%
Total 600 517,703 100%
2.2. Procedure
ACCEPTED MANUSCRIPT
11
The procedure for developing, analyzing, and assessing the p-frame list involved four stages:
initial candidate extraction, manual filtering, structural and functional analysis, and rater
assessment.
2.2.1. Automatic extraction of p-frame candidates
Several methodological issues needed to be considered in the initial candidate extraction stage.
The first had to do with the approach to p-frame identification. Previous studies employed either
the bundles-to-frame approach (e.g. Biber, 2009; Cunningham, 2017; Römer, 2010) or the fully
inductive approach (e.g. Fuster-MĂĄrquez & Pennock-Speck, 2015; Grabowski, 2015; Gray &
Biber, 2013). The former begins with identifying lexical bundles and then analyzes them to
determine p-frames. As mentioned above, this approach would fail to capture the full range of
variants of the p-frames identified and, as Gray and Biber (2013) noted, miss meaningful p-frames
with highly diverse, infrequent variants. To avoid these limitations, the current study adopted the
fully inductive approach, which identifies p-frames based on all continuous lexical sequences.
The second issue had to do with p-frame length. We began by extracting 4-word frames
following the practice in several previous studies (Fuster-MĂĄrquez & Pennock-Speck, 2015;
Grabowski, 2015; Gray & Biber, 2013; Römer, 2010). However, our preliminary analysis showed
that the majority of the 4-word frames were incomplete units or contained function words only
(e.g. the * of the). The pedagogical relevance of such units appeared questionable. We therefore
decided to focus on five- and six-word frames only. The increased length allowed us to identify p-
frames that are semantically more complete and more specific to the part-genre of RA
introductions.
ACCEPTED MANUSCRIPT
12
Another important methodological consideration was to determine the optimal combination of
cut-off points of p-frame frequency, the number of variants, and the number of texts and disciplines
in which a p-frame appears. There exists a certain degree of arbitrariness among previous studies
in threshold setting. For example, depending on corpus size, researchers have set the frequency
threshold from 10 occurrences per million words (PMW) (e.g. Simpson-Vlach & Ellis 2010) to 40
occurrences (e.g. Biber et al. 2004). To establish the ideal cut-off points appropriate for the size of
our corpus, we conducted an explorative investigation using a range of threshold combinations.
As the result of this investigation, we settled on five-word p-frames with at least 16 occurrences
PMW and six-word p-frames with at least 12 occurrences PMW; additionally, each p-frame should
have two or more variants and should occur in three or more texts across two or more disciplines.
After all the necessary decisions were made, we extracted five- and six-word p-frames using
kfNgram (Fletcher, 2011). For each p-frame, kfNgram provides its token count, a list of its variants,
and the token count for each variant. The initial candidate p-frames extracted included all possible
p-frames with a single variable slot in any position. However, we decided to discard p-frames with
an initial variable slot as most of them crossed phrasal or clausal boundaries (e.g. * a growing body
of with variants of to, so, reasoning, etc.). With these p-frames excluded, the initial candidate list
included 594 five-word p-frames and 167 six-word p-frames.
2.2.2. Manual filtering
The candidate list required manual scrutiny to exclude p-frames that were not meaningful or
pedagogically relevant. This involved a considerable amount of concordance analysis using
Antconc 3.5.0. (Anthony, 2017), in which each p-frame was examined in its original contexts of
use and filtered using three criteria. First, frames which were linguistically incomplete (e.g. of the
ACCEPTED MANUSCRIPT
13
paper is *) or crossed clausal boundaries (e.g. organized as follows the * section) were excluded.
Second, frames which could be subsumed under larger frames were discarded. For example, the
article is * as was removed as it was part of the article is * as follows. Finally, frames which could
be better represented as a lexical bundle (e.g. on the one hand *, where the slot can be filled by
virtually any word) were also excluded. To avoid researcher bias, only items marked as “exclude”
by all researchers were excluded. This procedure resulted in a final list of 370 five-word p-frames
and 84 six-word p-frames.
2.2.3. Structural and functional classification
The final entries underwent structural and functional analyses based on Gray and Biber’s (2013)
structural taxonomy and Simpson-Vlach and Ellis’ (2010) functional taxonomy. It was our hope
that information on the structure and function of each p-frame would enhance the usefulness of
the list for EAP teachers and students.
Gray and Biber (2013) suggested three structural categories: (a) verb-based frames (frames
containing one or more verb, e.g. must be * to); (b) other-content-word frames (frames containing
one or more content words except verbs, e.g. on the * hand); and (c) function-word frames (frames
containing only function words, e.g. the * of this).
Simpson-Vlach and Ellis’ (2010) functional taxonomy, adapted from that of Biber et al.’s
(2004), also posits three primary categories—referential, stance, and discourse expressions—with
several levels of sub-categories. We analyzed the frames based on the primary and second-level
categories only (see Section 3.2), as many of the more fine-grained functions were not applicable
to RA introductions. This classification required a substantial amount of concordance analysis. For
each occurrence of a p-frame, we determined its functional category based on the semantics of the
ACCEPTED MANUSCRIPT
14
variant and the context in which it occurred. This approach inevitably resulted in some p-frames
being identified as multi-functional. For example, in this * it is was labeled as both ‘referential’
and ‘discourse-organizing’, since it contained some variants referring to research context (e.g.
context and setting) and others to textual elements (e.g. essay and study).
While this variant-based approach to functional analysis has been commonly adopted by
previous researchers (e.g. Fuster-Mårquez & Pennock-Speck, 2015; Römer, 2010), some
researchers have raised concerns regarding its contingent nature. Grabowski (2015) proposed a
fixed-frame-based approach that assigns functional labels to p-frames based on “the nature of their
fixed components rather than the semantics of slot-fillers and/or longer chunks of texts within a
given p-frame” (p.271), arguing that p-frames and lexical bundles are distinctive constructs and p-
frames can be functionally analyzed independently from their textual realizations. Recognizing the
rationale and value of the fixed-frame-based approach, we see the variant-based approach as well
suited for the purposes of the current study, as the discourse functions of the textual realizations
of the p-frames would likely prove useful for helping EAP writers acquire contextually appropriate
uses of those variants.
2.2.4. Instructor and student writer review
Before finalizing the list, we solicited reviews of a subset of the list from two experienced
academic writing instructors and two student writers enrolled in the MA TESL program at a large
public university in the U.S. While previous studies that integrated rater assessment of the
pedagogical value of their lists all relied solely on expert or teacher perspectives (e.g. Ackerman
& Chen, 2013; Simpson-Vlach & Ellis, 2010), we considered it informative to include learner
perspectives as well to ensure the list’s potential usefulness to both EAP teachers and learners. For
ACCEPTED MANUSCRIPT
15
the review, the raters were provided with a random sample of 50 five-word p-frames and 50 six-
word p-frames; for each p-frame, they also received information on its frequency, the number of
variant types, the actual variants, and the number of texts and disciplines in which it occurred.
They were then asked to rate each p-frame using the following four-point Likert scale:
1= pattern not recognizable; frame not useful
2= pattern recognizable; frame not useful
3= pattern recognizable; frame somewhat useful
4= pattern recognizable; frame very useful
In our view, the pedagogical value of the p-frames on the list will likely vary substantially
depending on the level of experience and expertise of the academic writer. As such, rather than
using the ratings of a small group of instructors and student writers to include and exclude specific
p-frames, we were more interested in obtaining a preliminary sense of the proportion of the p-
frames on the list that may be pedagogically useful to either EAP instructors or learners, or both.
Thus, in analyzing the results of the ratings, we considered a p-frame to be useful if it earned a
total score of 5 (or an average score of 2.5) from either the instructor group or the learner group.
We note that our criterion of 5 points out of a maximum of 8 was more stringent than the criterion
of 9 points out of a maximum of 24 adopted by Ackerman and Chen (2013) in rating the
pedagogical usefulness of academic collocations. In total, 91 p-frames (91%) received a total score
of 5 or more by either one or both groups, indicating that the overwhelming majority of the p-
frames on the list may be considered useful by either EAP instructors or learners, or both. The p-
frames that were scored under 5 by both groups were mostly bundle-like frames (e.g. play an
ACCEPTED MANUSCRIPT
16
important role in *, with somewhat incoherent variants such as coordinating, modifying, etc.). This
observation was also supported by comments from the reviewers. For example, one teacher
reviewer noted that “I think play an important role is good, but question the role of the p-frame in
helping students develop their ideas further.”
3. Results
This section presents the results of the structural and functional analysis of the final, filtered
list of p-frames extracted from the corpus of social science RA introductions. The first subsection
details the structural categorization of the p-frames based on Gray and Biber’s (2013) taxonomy,
and the second the functional categorization based on Simpson-Vlach and Ellis’ (2010) taxonomy.
More emphasis is placed on the functional analysis, given its greater importance in pedagogy. Our
analysis shows clear differences in both structure and function between five-word and six-word p-
frames.
3.1. Structural categorization
Table 2 summarizes the distribution of five-word and six-word p-frames by structure. The
majority of five-word p-frames are other-content-word based (64.3%, n = 238), followed by verb-
based frames (28.9%, n = 107). Only a small proportion of five-word p-frames consist entirely of
function words (6.8%, n = 25), partly because many function-word frames were found to be part
of a six-word p-frame and therefore removed in the manual filtering stage. For six-word p-frames,
verb-based frames account for a larger proportion (54.8%, n = 46) than other-content-word frames
(45.2%, n = 38). No six-word p-frame consists of function words only. The differences in the
ACCEPTED MANUSCRIPT
17
distribution of five-word and six-word p-frames may not be surprising, given the increased
likelihood to encounter verbs or other content words in longer sequences.
Table 2
Distribution of the p-frames by structure.
Length Verb-based frames Other-content-word frames Function-word frames Total
Five-word 107 (28.9%) 238 (64.3%) 25 (6.8%) 370
Six-word 46 (54.8%) 38 (45.2%) 0 (0.0%) 84
All 153 (33.7%) 276 (60.8%) 25 (5.5%) 454
Some examples of p-frames in each structural category are presented below. The words in
square brackets indicate the variants that fill the open slot in each frame.
a. Verb-based frames:
we find [little, no, strong, suggestive, weak] evidence that
the [aim, purpose, goal, objective] of this article is
b. Other-content-word frames:
a brief [account, description, overview, reminder, review] of the
in the present study we [investigated, examine(d), focus, test(ed)]
c. Function-word frames:
one of the most [basic, common, fundamental, important, prevalent, significant]
the [degree, extent, height, spread] to which the
ACCEPTED MANUSCRIPT
18
3.2. Functional categorization
Table 3 summarizes the distribution of five-word and six-word p-frames by primary function.
For five-word p-frames, referential frames make up the largest category (55.1%, n = 204), followed
by stance frames (19.5%, n = 72) and discourse organizing frames (18.9%, n = 70). For six-word
p-frames, however, discourse organizing frames account for the largest proportion (64.3%, n =
54), followed by referential frames (15.5%, n = 13) and stance frames (13.1%, n = 11). A small
proportion of five-word (6.5%, n = 24) and six-word (7.1%, n = 6) p-frames were found to be
multifunctional, with their functions vary depending on the variants.
Table 3
Distribution of the p-frames by primary function.
Length Referential Stance Discourse Multifunction Total
Five-word 204 (55.1%) 70 (18.9%) 70 (18.9%) 26 (7.0%) 370
Six-word 13 (15.5%) 11 (13.1%) 54 (64.3%) 6 (7.1%) 84
All 217 (47.8%) 81 (17.8%) 124 (27.3%) 32 (7.0%) 454
In the rest of this section, we present some examples of p-frames in different primary and
second-level functional categories and discuss how they are used in context in social science RA
introductions. Entries with substantial overlap in terms of structure, function, and variants are
collapsed to capture their commonality, when doing so does not lose important details about the
structure, function, and variants of individual p-frames. For example, the p-frames the aim of this
*, the purpose of this *, the aim of the *, and the purpose of the * are collapsed into the aim/purpose
of this/the *. Due to space constraint, only one p-frame along with some of its most frequent
ACCEPTED MANUSCRIPT
19
variants is provided to illustrate each functional category. The full list of p-frames and their
complete variants are provided in Appendix A.
3.2.1. Referential p-frames
As shown in Table 3, overall, referential p-frames make up the largest category. Table 4
summarizes the proportions of p-frames in the five subcategories of referential p-frames in
Simpson-Vlach and Ellis’ (2010) taxonomy. We did not find frames functioning as vagueness
markers (i.e. phrases indicating imprecise reference, e.g. and so on) in our data. The largest
functional subcategory for all frames was specification of attributes. A p-frame in this subcategory
identifies specific attributes of a following nominal or clause, as illustrated in a.1.
Table 4
Subcategories of referential p-frames.
Length Specification of
Attributes
Identification
and Focus
Contrast and
Comparison
Deictics
and
Locatives
Vagueness
Markers
Total
Five-
word
164 (80.4%) 18 (8.8%) 10 (4.9%) 12 (5.9%) 0 (0.0%) 204
Six-
word
10 (76.9%) 0 (0.0%) 2 (15.4%) 1 (7.7%) 0 (0.0%) 13
All 174 (80.2%) 18 (8.3%) 12 (5.5%) 13 (6.0%) 0 (0.0%) 217
a. Referential p-frames
a.1. Specification of attributes, e.g. the presence or absence of [data, information, feature]
Ex. 1. In both experiments, we manipulate the presence or absence of [information]
intended to trigger 

ACCEPTED MANUSCRIPT
20
Ex. 2. Moreover, our reliance on the presence or absence of [data] from a long-running
data series provides greater coverage 

Identification and focus was the second largest subcategory of referential expressions. In RA
introductions identification and focus frames either introduce the focus of previous literature or
establish the focus area of one’s own study, as illustrated in a.2.
a.2. Identification and focus, e.g. focus(-ing, -ed, -es) on the [consequences, effect(s), efficacy,
impact, implications] of
Ex. 3. In contrast, I focus on the [effects] of liquidity constraints on the extensive
margin 

Ex. 4. Studies of domestic courts usually focus on the [role] of courts in serving as deciders
of contentious issues.
Contrast and comparison frames are relatively small in number. Many frames in this category
are used to introduce one’s own research in relation to previous literature, sometimes highlighting
its unique focus, as illustrated in a.3.
a.3.Contrast and comparison, e.g. is [also, clearly, closely, inherently, positively] related to the
Ex. 5. This article is [also] related to the literature on savings, growth, and investment.
Ex. 6. My policy analysis is [closely] related to the personnel economics literature on
incentive contracts ...
ACCEPTED MANUSCRIPT
21
Deictic and locative frames are also small in number. Such p-frames are often used to provide
contextual information of one’s research site or to contextualize one’s own research in a specific
time period or location relative to previous research, as illustrated in a.4.
a.4. Deictic and locative, e.g. at the [beginning, end, start, time] of the
Ex. 7. Notably, the turn to the corporeal at the [end] of the twentieth century has had a
salutary effect 

Ex. 8. I find that Christian and Islamic communities had, at the [time] of the survey, the
most positive impact on respect for religious freedom in Ibadan ...
3.2.2. Stance p-frames
Stance expressions provide a means for conveying one’s attitude, perspective, or position
toward an event, action, or a proposition. Simpson-Vlach and Ellis (2010) suggested six
subcategories of stance expressions, namely, hedges, epistemic stance, expressions of ability and
possibility, evaluation, obligation and directive, and intention/volition and prediction. However,
we did not find frames in the last two subcategories in our data (Table 5).
Table 5
Subcategories of stance p-frames.
Length Hedges Epistemic Ability Evaluation Obligation Intention Total
Five-word 13 (18.6%) 19 (27.1%) 3 (4.3%) 35 (50.0%) 0 (0.0% ) 0 (0.0% ) 70
Six-word 1 (9.1%) 6 (54.5%) 0 (0.0%) 4 (36.4%) 0 (0.0% ) 0 (0.0% ) 11
All 14 (17.3%) 25 (30.9%) 3 (3.7%) 39 (48.1%) 0 (0.0%) 0 (0.0% ) 81
ACCEPTED MANUSCRIPT
22
Expressions in the hedges subcategory are known to play a crucial role in academic writing as
they allow writers to express uncertainty regarding the truth value of their statements, enabling
them not only to show modesty and reservation but also to avoid personal accountability (Hyland,
1994). In our data, hedges were expressed often through adjectives introduced by the copula be,
as illustrated in b.1.
b. Stance p-frames
b.1. Hedges, e.g. are [less, relatively, more, not, also] likely to be
Ex. 9. Instead, we have shown that as oil wealth rises, autocracies are [less] likely to be
ousted by groups that would initiate new dictatorships

Ex. 10. Most importantly, empirical research has repeatedly shown that evangelical
Protestants are [relatively] likely to be lower class.
Epistemic stance frames are somewhat similar to hedges in that they also include expressions
of uncertainty. However, such frames have more to do with “knowledge claims or demonstrations”
and “reports of claims by others” (Simpson-Vlach & Ellis, 2010, p.506), as illustrated in b.2.
b.2. Epistemic stance, e.g. may or may not be [useful, protective]
Ex. 11. ... they may instead do so in their first language, which may or may not be [useful]
in helping them develop literacy in their L2.
Ex. 12. However, perceived control may or may not be [protective] against mortality 

ACCEPTED MANUSCRIPT
23
The ability and possibility frames express or introduce some possible action or proposition. In
social science RA introductions, the ability and possibility frames are often used to justify or
rationalize research focus or design, as illustrated in b.3.
b.3. Ability and possibility, e.g. allows us to [address, assess, explore, investigate, measure,
observe, study] the
Ex. 13. The time-series dimension allows us to [address] the potential endogeneity of
network ties.
Ex. 14. This allows us to [explore] the relationship between non-standard work hours and
fertility decisions from different perspectives 

The evaluation category formed the largest group of stance p-frames. P-frames in this category
are often used to evaluate one’s own or others’ research through evaluative adjectives, as illustrated
in b.4.
b.4. Evaluation, e.g. it is important to [note, emphasize, underscore, acknowledge] that
Ex. 15. It is important to [note] that turning points may vary in valence (negative or
positive), severity, and duration across individuals.
Ex. 16. Likewise, it is important to [emphasize] that our experiment only studies a small
sampling of the many decision environments

3.2.3 Discourse organizing frames
ACCEPTED MANUSCRIPT
24
Discourse organizing frames, the second largest group in our list, served four main functions
following Simpson-Vlach and Ellis’ (2010) taxonomy: metadiscourse, topic introduction, topic
elaboration, and discourse markers. The first subcategory, metadiscourse and textual reference
includes frames that seem to be genre-specific, signaling the outline of the article, as illustrated in
c.1.
Table 6
Subcategories of discourse organizing p-frames.
Length Metadiscourse Topic
introduction
Topic
elaboration
Discourse
markers
Total
Five-word 33 (47.1%) 13 (18.6%) 23 (32.9%) 1 (1.4%) 70
Six-word 32 (59.3%) 20 (37.0%) 1 (1.9%) 1 (1.9%) 54
All 65 (52.4%) 33 (26.6%) 24 (19.4%) 2 (1.6%) 124
c. Discourse organizing frames
c.1. Metadiscourse, e.g. the article/paper is [organized, structured] as follows
Ex. 17. The article is [structured] as follows: first, the literature on learners’ cognitive
processes in L2 pragmatics research is reviewed 

Ex. 18. The paper is [organized] as follows: Section 2 presents the social choice
environment.
The subcategory of topic introduction and focus signals the topic or the goal of the research.
This category, as Simpson-Vlach and Ellis (2010) noted, functionally overlaps with the
identification and focus category under referential expressions to some degree. The main
ACCEPTED MANUSCRIPT
25
difference between the two is that the topic introduction and focus frames serve more “global
discourse organizing function of introducing a topic,” as illustrated in c.2, whereas the
identification and focus frames have more to do with “local referential function of identification”
(p.507).
c.2. Topic introduction and focus, e.g. the primary [purpose, goal, aim, objective, contribution]
of this study/article/paper
Ex. 19. The primary [purpose] of this study was to classify the regime types for twenty-
four countries in the Americas ...
Ex. 20. Accordingly, the primary [goal] of this study is to test and extend the
metatheoretical framework proposed by Ferris and colleagues 

The topic elaboration subcategory relates to explicating and elaborating a topic previously
introduced. Many frames in this category include phrases signaling a cause/reason and effect
relationship, as illustrated in c.3.
c.3. Topic elaboration, e.g. to [assess, estimate, evaluate, examine, explore, measure, study,
test] the effect(s) of
Ex. 21. Hypotheses are developed to [evaluate] the effects of item positioning on response
behavior under the three mechanisms.
Ex. 22. The test was designed to [estimate] the effect of paid search on sales ...
ACCEPTED MANUSCRIPT
26
Discourse markers generally serve to connect ideas smoothly and logically. Our list only
contains a few p-frames functioning as discourse markers (e.g. c.4), partially because many of
them were considered to be better represented as lexical bundles rather than p-frames (e.g. on the
other hand *) and thus removed at the manual filtering stage.
c.4. Discourse markers, e.g. in addition to the [literature, amount, outcome, above-mentioned]
Ex. 23. In addition to the [literature] on contract theory and mechanism design with limited
commitment, our analysis is related to two other strands of literature ...
Ex. 24. In addition to the [above-mentioned] reasons, an early start of FL learning has
been uncritically accompanied by expectations of superior L2 outcomes ...
The functional categories of the p-frames discussed above are not meant to be taken as
definitive and exclusive, as some have multiple functions, but rather as indications of the most
salient function they tend to fulfill in social science RA introductions.
4. Discussion
The purpose of this study was to derive a pedagogically useful list of p-frames from a corpus
of a specific part-genre, i.e. research article introductions, in six social science disciplines. This
research aim was motivated by insights into the importance of formulaic sequences in academic
English as well as the variation in formulaic language use across different registers, genres, and
disciplines (e.g. Biber et al., 2004; Cortes, 2013; Coxhead & Byrd, 2007; Cunningham, 2017;
Grabowski, 2015; Hyland, 2008), success of previous corpus-based efforts in compiling lists of
academic formulaic expressions (e.g. Simpson-Vlach & Ellis, 2010; Martinez & Schmitt, 2012)
ACCEPTED MANUSCRIPT
27
and the absence of academic p-frame lists, and the perceived value of p-frame lists for EAP writing
pedagogy (Cunningham, 2017; Fletcher, 2006, 2011; Gray & Biber, 2013). The study thus serves
to fill an important gap in academic formulaic language research.
Recognizing the limitations of the bundles-to-frame approach to p-frame extraction (e.g.
Cunningham, 2017; Römer, 2010), we argued for and adopted the fully inductive approach in
which p-frames are identified based on all continuous lexical sequences, rather than just lexical
bundles, found in the corpus (Fuster-MĂĄrquez & Pennock-Speck, 2015; Grabowski, 2015; Gray &
Biber, 2013). A combination of corpus statistics was used to extract an initial set of p-frame
candidates with adequate frequency, variant diversity, and range across disciplines using kfNgram
(Fletcher, 2011). These candidates were then manually filtered in several steps to ensure their
semantic completeness and pedagogical value. The resulting 370 five-word p-frames and 84 six-
word p-frames were analyzed using Gray and Biber’s (2013) structural taxonomy and Simpson-
Vlach and Ellis’ (2010) functional taxonomy. Overall, the majority of p-frames (60.8%) were
other-content-word (excluding verbs) frames. While Cunningham (2017) used a different
taxonomy (Biber et al., 1999) for structural analysis, it is clear that the majority of the p-frames
identified from the corpus of mathematics research articles were verb-based. The functional
analysis was performed using a variant-based approach (e.g. Fuster-MĂĄrquez & Pennock-Speck,
2015; Römer, 2010), which examines the functions of specific realizations of each p-frame in
context, rather than a fixed-frame-based approach (Grabowski, 2015), which determines the
function of each p-frame based on its fixed components. This was the case based on the
consideration that the functional categories assigned to different variants of the p-frames in context
may prove useful in helping EAP learners acquire contextually appropriate uses of the p-frames
and their variants. The variant-based approach resulted in a subset of p-frames being categorized
ACCEPTED MANUSCRIPT
28
in more than one functional category. Overall, referential p-frames accounted for the largest
proportion (47.8%), but the majority of six-word p-frames (64.3%) were discourse organizing
frames. The final list is organized by function and presented in Appendix A, with multifunctional
p-frames listed under separate categories with the corresponding variants.
Evaluation of a random sample of 100 p-frames by a panel of academic writing instructors and
student writers indicated that the overwhelming majority (91%) of the p-frames on the list were
considered pedagogically useful to either the instructors or student writers, or both. While most
previous studies solicited review by experts only, such as instructors, testers, publishers, and
lexicographers (e.g. Ackerman & Chen, 2013; Simpson-Vlach & Ellis, 2010), we considered it
useful to include student or learner perspectives as well, as ultimately the list is intended to serve
their learning needs. Previous studies also commonly used the review by a large panel of experts
to eliminate candidate expressions that had already been filtered by the researchers (e.g. Ackerman
& Chen, 2013; Martinez and Schmitt, 2012). While we acknowledge the benefits of having a larger
panel of reviewers with diverse academic backgrounds, we also deem it difficult to adequately
represent the pedagogical needs of all academic writers, given the many criteria that should be
considered (e.g. L1 background, level of experience and expertise in academic writing, discipline,
etc.). As such, the use of any panel’s judgment as absolute criteria to include or exclude candidate
p-frames does not necessarily constitute the most optimal solution, as confirmed by our panel
members. It may indeed be more productive to leave room for EAP teachers and learners to make
their own judgment based on their specific pedagogical contexts and learning needs.
Some overlap exists between p-frames and other types of academic expressions. For example,
complete variants of the p-frames, when examined individually, are reminiscent of formulas and
phrases, and the fillers of the p-frames may remind one of collocations. However, different from
ACCEPTED MANUSCRIPT
29
individual formulas and phrases, p-frames provide information about patterns and their variability,
and different from collocations presented in isolation, p-frames contextualize co-occurring words
in syntactic patterns. As such, the p-frame list constitutes a useful addition to existing lists of
academic vocabulary (Coxhead, 2000; Gardner & Davies, 2013), collocations (Ackerman & Chen,
2013), and continuous formulaic sequences (Martinez and Schmitt, 2012; Morley, 2015; Simpson-
Vlach & Ellis, 2010) reviewed earlier.
5. Conclusion
The current study has extracted, analyzed, and evaluated a pedagogically useful list of p-frames
from a corpus of social science research article introductions. Focusing on six social science
disciplines, this study sits somewhere in the middle in the discipline specificity continuum. While
we did not focus on inter-disciplinary variation, the identification of p-frames that are uniquely
useful in specific disciplines would certainly constitute a productive avenue of future research (e.g.
Cunningham, 2017). Additionally, although the p-frames identified in our study all occur in at
least two disciplines, the p-frame variants may be analyzed in terms of their specificity to
individual disciplines (e.g. Fuster-MĂĄrquez, 2014; Grabowski, 2015).
The focus on the specific part-genre of RA introductions may be both a limitation and strength.
On the one hand, it limits the pedagogical value of the list compiled primarily to this part-genre.
On the other hand, awareness of linguistic variation across genres and part-genres is critical to the
development of EAP learners’ genre competence, and resources targeting high-stake genres and
part-genres such as RA introductions will prove valuable for pedagogy aimed at promoting that
awareness (Cortes, 2013; Hyland, 2007). The research outcome generated in this study also paves
ACCEPTED MANUSCRIPT
30
the way for our ongoing research on aligning p-frames with rhetorical moves and steps in RA
introductions, and on identifying p-frames for other RA part-genres.
The pedagogical applications of the p-frame list compiled need to be carefully considered. For
EAP courses that take a genre approach to teaching academic writing, the p-frame list can serve
as a useful resource for assisting students’ analysis of language features that characterize RAs.
However, in our view lists of different types of academic expressions should best be used in an
integrative way to maximize their potential for promoting students’ genre competence. For
example, as students identify important collocations in RAs, the p-frame list can be a handy tool
to help them see the range of contexts or syntactic environments in which they occur in RAs.
Similarly, as students notice formulas that are frequently used in RAs, the p-frame list can help
them see patterns that such formulas fit in as well as related variants that they could use. We also
expect the list to serve as one of many useful reference tools to novice social science scholars as
they engage in RA writing. Our future research will investigate the pedagogical uses of this list in
genre-based academic writing classrooms and actual RA writing contexts. We also call for more
empirical research examining the feasibility and effectiveness of integrative pedagogical
applications of the different types of academic formulaic expression lists to validate existing lists,
identify best practices in using them, and inform future efforts in compiling new lists.
Appendix A. The complete phrase-frame list for social science research article introductions
The complete list of p-frames can be found at
http://www.personal.psu.edu/xxl13/download.html.
References
ACCEPTED MANUSCRIPT
31
Ackerman, K., & Chen, Y-H. (2013). Developing the Academic Collocation List (ACL) – A
corpus-driven and expert-judged approach. Journal of English for Academic Purposes, 12,
235-247.
Anthony, L. (2017). AntConc (Version 3.5.0) [Computer Software]. Tokyo, Japan: Waseda
University. Available from http://www.laurenceanthony.net/software
Biber, D. (2009). A corpus-driven approach to formulaic language in English. International
Journal of Corpus Linguistics, 14, 275-311.
Biber, D., & S. Conrad. (2009). Register, genre, and style. Cambridge: Cambridge University
Press.
Biber, D., Conrad., & Cortes, V. (2004). If you look at ...: Lexical bundles in university teaching
and textbooks. Applied Linguistics, 25, 371-405.
Biber, D., Leech, G., Johansson, S., Conrad, S., & Finegan, E. (1999). Longman grammar of
spoken and written English. London: Longman.
Cortes, V. (2013). The purpose of this study is to: Connecting lexical bundles and moves in
research article introductions. Journal of English for Academic Purposes, 12, 33-43.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213-238.
Coxhead, A., & Byrd, P. (2007). Preparing writing teachers to teach the vocabulary and grammar
of academic prose. Journal of Second Language Writing, 16, 129-147.
Cunningham, K. J. (2017). A phraseological exploration of recent mathematics research articles
through key phrase frames. Journal of English for Academic Purposes, 25, 71-83.
Fletcher, W. H. (2006). “Phrases in English” Home. Available from http://phrasesinenglish.org/
Fletcher, W. H. (2011). KfNgram. Annapolis, MD: USNA.
Fuster-MĂĄrquez, M. (2014). Lexical bundles and phrase frames in the language of hotel websites.
ACCEPTED MANUSCRIPT
32
English Text Construction, 7, 84-121.
Fuster-MĂĄrquez, M., & Pennock-Speck, B. (2015). Target frames in British hotel
websites. International Journal of English Studies, 15, 51-69.
Gardner, D., & Davies, M. (2013). A new academic vocabulary list. Applied Linguistics, 35(3),
305-327.
Grabowski, Ɓ. (2015). Phrase frames in English pharmaceutical discourse: A corpus-driven study
of intradisciplinary register variation. Research in Language, 13, 266-291.
Gray, B., & Biber, D. (2013). Lexical frames in academic prose and conversation. International
Journal of Corpus Linguistics, 18, 109-136.
Herbst, T. (2011). Choosing sandy beaches – Collocations, probabemes and the idiom principle.
In T. Herbst, S. Faulhaber, & P. Uhrig (Eds.), The phraseological view of language (pp. 27-
57). Berlin: Walter de Gruyter.
Hunston, S. (2002). Corpora in applied linguistics. Cambridge: Cambridge University Press.
Hyland, K. (1994). Hedging in academic writing and EAP textbooks. English for Specific
Purposes, 13, 239-56.
Hyland, K. (2007). Genre pedagogy: Language, literacy and L2 writing instruction. Journal of
Second Language Writing, 16, 148-164.
Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific
Purposes, 27(1), 4-21.
Johansson, S. (2011). Corpus, lexis, discourse: a tribute to John Sinclair. In T. Herbst, S. Faulhaber,
& P. Uhrig (Eds.), The phraseological view of language, (pp. 7-26). Berlin: Walter de Gruyter.
Martinez, R., & Schmitt, N. (2012). A phrasal expressions list. Applied Linguistics, 33, 299-320.
Morley, J. (2015). The Academic Phrasebank: An academic writing resource for students and
ACCEPTED MANUSCRIPT
33
researchers. Manchester, UK: The University of Manchester.
Nattinger, J. R., & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford:
Oxford University Press.
Paltridge, B. (2004). Academic writing. Language Teaching, 37, 87-105.
Römer, U. (2010). Establishing the phraseological profile of a text type: The construction of
meaning in academic book reviews. English Text Construction, 3, 95-119.
Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list: New methods in phraseology
research. Applied linguistics, 31, 487-512.
Sinclair, J. McH. (1991). Corpus concordance collocation. Oxford: Oxford University Press.
Sinclair, J. McH., & Carter, R. (Eds.) (2004). Trust the text: Language, corpus and discourse.
London/New York: Routledge.
Stubbs, M. (2009). Technology and phraseology: With notes on the history of corpus linguistics.
In U. Römer & R. Schulze (Eds.), Exploring the lexis-grammar interface, (pp. 15-31).
Amsterdam/Philadelphia: John Benjamins.
Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford: Oxford University Press.
ACCEPTED MANUSCRIPT
Xiaofei Lu is Associate Professor of Applied Linguistics and Asian Studies at The
Pennsylvania State University, where he directs the graduate programs in the Department
of Applied Linguistics. His research interests are primarily in corpus linguistics, intelligent
computer-assisted language learning, English for Academic Purposes, and second
language writing. He is the author of Computational Methods for Corpus Annotation and
Analysis (2014, Springer).
Jungwan Yoon is a Ph.D. candidate in the Department of Applied Linguistics at The
Pennsylvania State University. Her research interests include academic literacy
development, second language writing, corpus linguistics, and discourse analysis.
Olesya Kisselev is a Ph.D. candidate in the Department of Applied Linguistics at The
Pennsylvania State University. Before coming to Penn State, she was an instructor and
curriculum developer in the Russian Flagship Program at Portland State University. Her
research interests include corpus linguistics and discourse analysis, especially as they apply
to the study of various aspects of second language and heritage language acquisition.

More Related Content

Similar to A Phrase-Frame List For Social Science Research Article Introductions

A corpus driven comparative analysis of modal verbs in pakistani and british ...
A corpus driven comparative analysis of modal verbs in pakistani and british ...A corpus driven comparative analysis of modal verbs in pakistani and british ...
A corpus driven comparative analysis of modal verbs in pakistani and british ...Alexander Decker
 
Analyzing Verbs In The Discussion Section Of Master S Theses Written By Irani...
Analyzing Verbs In The Discussion Section Of Master S Theses Written By Irani...Analyzing Verbs In The Discussion Section Of Master S Theses Written By Irani...
Analyzing Verbs In The Discussion Section Of Master S Theses Written By Irani...Jim Jimenez
 
Academic-Phrasebank.pdf
Academic-Phrasebank.pdfAcademic-Phrasebank.pdf
Academic-Phrasebank.pdfSirajudinAkmel1
 
lexicography
lexicographylexicography
lexicographyayfa
 
The psychometric analysis of the persian version of the strategy inventory fo...
The psychometric analysis of the persian version of the strategy inventory fo...The psychometric analysis of the persian version of the strategy inventory fo...
The psychometric analysis of the persian version of the strategy inventory fo...Dr. Seyed Hossein Fazeli
 
A Rhetorical Move Analysis Of TEFL Thesis Abstracts The Case Of Allameh Taba...
A Rhetorical Move Analysis Of TEFL Thesis Abstracts  The Case Of Allameh Taba...A Rhetorical Move Analysis Of TEFL Thesis Abstracts  The Case Of Allameh Taba...
A Rhetorical Move Analysis Of TEFL Thesis Abstracts The Case Of Allameh Taba...Amy Cernava
 
Academic Vocabulary In Tourism Research Articles A Corpus-Based Study
Academic Vocabulary In Tourism Research Articles  A Corpus-Based StudyAcademic Vocabulary In Tourism Research Articles  A Corpus-Based Study
Academic Vocabulary In Tourism Research Articles A Corpus-Based StudyRichard Hogue
 
A Study Of Lexical Ties Used In Medical Science Articles Written By Iranian A...
A Study Of Lexical Ties Used In Medical Science Articles Written By Iranian A...A Study Of Lexical Ties Used In Medical Science Articles Written By Iranian A...
A Study Of Lexical Ties Used In Medical Science Articles Written By Iranian A...Lindsey Sais
 
A Research Of English Article Errors In Writings By Chinese ESL Learners
A Research Of English Article Errors In Writings By Chinese ESL LearnersA Research Of English Article Errors In Writings By Chinese ESL Learners
A Research Of English Article Errors In Writings By Chinese ESL LearnersScott Donald
 
Applying Corpus-Based Findings To Form-Focused Instruction The Case Of Repor...
Applying Corpus-Based Findings To Form-Focused Instruction  The Case Of Repor...Applying Corpus-Based Findings To Form-Focused Instruction  The Case Of Repor...
Applying Corpus-Based Findings To Form-Focused Instruction The Case Of Repor...Justin Knight
 
Assessing EFL Learner,s Authorial Stance in Academic Writing: A Case of Out T...
Assessing EFL Learner,s Authorial Stance in Academic Writing: A Case of Out T...Assessing EFL Learner,s Authorial Stance in Academic Writing: A Case of Out T...
Assessing EFL Learner,s Authorial Stance in Academic Writing: A Case of Out T...English Literature and Language Review ELLR
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Umm-e-Rooman Yaqoob
 
Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...
Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...
Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...English Literature and Language Review ELLR
 
A Contrastive Study Of Genric Organization Of Thesis Discussion Section Writt...
A Contrastive Study Of Genric Organization Of Thesis Discussion Section Writt...A Contrastive Study Of Genric Organization Of Thesis Discussion Section Writt...
A Contrastive Study Of Genric Organization Of Thesis Discussion Section Writt...Nat Rice
 
Analysing and interpreting discipline based language
Analysing and interpreting discipline based languageAnalysing and interpreting discipline based language
Analysing and interpreting discipline based languagePRASANTH VENPAKAL
 
An exploration of the generic structures of problem statements in research ...
An exploration of the generic structures of problem   statements in research ...An exploration of the generic structures of problem   statements in research ...
An exploration of the generic structures of problem statements in research ...Alexander Decker
 
A Review Of How To Analyse Texts A Toolkit For Students Of English
A Review Of How To Analyse Texts  A Toolkit For Students Of EnglishA Review Of How To Analyse Texts  A Toolkit For Students Of English
A Review Of How To Analyse Texts A Toolkit For Students Of EnglishTony Lisko
 

Similar to A Phrase-Frame List For Social Science Research Article Introductions (20)

Specialist genres
Specialist genresSpecialist genres
Specialist genres
 
A corpus driven comparative analysis of modal verbs in pakistani and british ...
A corpus driven comparative analysis of modal verbs in pakistani and british ...A corpus driven comparative analysis of modal verbs in pakistani and british ...
A corpus driven comparative analysis of modal verbs in pakistani and british ...
 
Academic Words in the English Research Article Abstracts: the Coverage and Fr...
Academic Words in the English Research Article Abstracts: the Coverage and Fr...Academic Words in the English Research Article Abstracts: the Coverage and Fr...
Academic Words in the English Research Article Abstracts: the Coverage and Fr...
 
Analyzing Verbs In The Discussion Section Of Master S Theses Written By Irani...
Analyzing Verbs In The Discussion Section Of Master S Theses Written By Irani...Analyzing Verbs In The Discussion Section Of Master S Theses Written By Irani...
Analyzing Verbs In The Discussion Section Of Master S Theses Written By Irani...
 
Academic-Phrasebank.pdf
Academic-Phrasebank.pdfAcademic-Phrasebank.pdf
Academic-Phrasebank.pdf
 
lexicography
lexicographylexicography
lexicography
 
The psychometric analysis of the persian version of the strategy inventory fo...
The psychometric analysis of the persian version of the strategy inventory fo...The psychometric analysis of the persian version of the strategy inventory fo...
The psychometric analysis of the persian version of the strategy inventory fo...
 
A Rhetorical Move Analysis Of TEFL Thesis Abstracts The Case Of Allameh Taba...
A Rhetorical Move Analysis Of TEFL Thesis Abstracts  The Case Of Allameh Taba...A Rhetorical Move Analysis Of TEFL Thesis Abstracts  The Case Of Allameh Taba...
A Rhetorical Move Analysis Of TEFL Thesis Abstracts The Case Of Allameh Taba...
 
Academic Vocabulary In Tourism Research Articles A Corpus-Based Study
Academic Vocabulary In Tourism Research Articles  A Corpus-Based StudyAcademic Vocabulary In Tourism Research Articles  A Corpus-Based Study
Academic Vocabulary In Tourism Research Articles A Corpus-Based Study
 
A Study Of Lexical Ties Used In Medical Science Articles Written By Iranian A...
A Study Of Lexical Ties Used In Medical Science Articles Written By Iranian A...A Study Of Lexical Ties Used In Medical Science Articles Written By Iranian A...
A Study Of Lexical Ties Used In Medical Science Articles Written By Iranian A...
 
A Research Of English Article Errors In Writings By Chinese ESL Learners
A Research Of English Article Errors In Writings By Chinese ESL LearnersA Research Of English Article Errors In Writings By Chinese ESL Learners
A Research Of English Article Errors In Writings By Chinese ESL Learners
 
Applying Corpus-Based Findings To Form-Focused Instruction The Case Of Repor...
Applying Corpus-Based Findings To Form-Focused Instruction  The Case Of Repor...Applying Corpus-Based Findings To Form-Focused Instruction  The Case Of Repor...
Applying Corpus-Based Findings To Form-Focused Instruction The Case Of Repor...
 
Assessing EFL Learner,s Authorial Stance in Academic Writing: A Case of Out T...
Assessing EFL Learner,s Authorial Stance in Academic Writing: A Case of Out T...Assessing EFL Learner,s Authorial Stance in Academic Writing: A Case of Out T...
Assessing EFL Learner,s Authorial Stance in Academic Writing: A Case of Out T...
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
 
DESPI-SYNTACTIC-DEVIATION.pdf
DESPI-SYNTACTIC-DEVIATION.pdfDESPI-SYNTACTIC-DEVIATION.pdf
DESPI-SYNTACTIC-DEVIATION.pdf
 
Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...
Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...
Psycholinguistic Analysis of Topic Familiarity and Translation Task Effects o...
 
A Contrastive Study Of Genric Organization Of Thesis Discussion Section Writt...
A Contrastive Study Of Genric Organization Of Thesis Discussion Section Writt...A Contrastive Study Of Genric Organization Of Thesis Discussion Section Writt...
A Contrastive Study Of Genric Organization Of Thesis Discussion Section Writt...
 
Analysing and interpreting discipline based language
Analysing and interpreting discipline based languageAnalysing and interpreting discipline based language
Analysing and interpreting discipline based language
 
An exploration of the generic structures of problem statements in research ...
An exploration of the generic structures of problem   statements in research ...An exploration of the generic structures of problem   statements in research ...
An exploration of the generic structures of problem statements in research ...
 
A Review Of How To Analyse Texts A Toolkit For Students Of English
A Review Of How To Analyse Texts  A Toolkit For Students Of EnglishA Review Of How To Analyse Texts  A Toolkit For Students Of English
A Review Of How To Analyse Texts A Toolkit For Students Of English
 

More from Tye Rausch

PPT - THE PERSUASIVE ESSAY PowerPoint Presentation, Free Download - ID
PPT - THE PERSUASIVE ESSAY PowerPoint Presentation, Free Download - IDPPT - THE PERSUASIVE ESSAY PowerPoint Presentation, Free Download - ID
PPT - THE PERSUASIVE ESSAY PowerPoint Presentation, Free Download - IDTye Rausch
 
5 Tips For Writing A Good History Paper Complete D
5 Tips For Writing A Good History Paper Complete D5 Tips For Writing A Good History Paper Complete D
5 Tips For Writing A Good History Paper Complete DTye Rausch
 
Argumentative Essay Thesis Statement Examples
Argumentative Essay Thesis Statement ExamplesArgumentative Essay Thesis Statement Examples
Argumentative Essay Thesis Statement ExamplesTye Rausch
 
Child Labour Essay For School Students In English Essay On Child Labour
Child Labour Essay For School Students In English Essay On Child LabourChild Labour Essay For School Students In English Essay On Child Labour
Child Labour Essay For School Students In English Essay On Child LabourTye Rausch
 
Grinch Day Writing.Pdf Christmas Kindergarten, W
Grinch Day Writing.Pdf Christmas Kindergarten, WGrinch Day Writing.Pdf Christmas Kindergarten, W
Grinch Day Writing.Pdf Christmas Kindergarten, WTye Rausch
 
Scarecrow Craft And Writing Bulletin Board Writing Cra
Scarecrow Craft And Writing Bulletin Board Writing CraScarecrow Craft And Writing Bulletin Board Writing Cra
Scarecrow Craft And Writing Bulletin Board Writing CraTye Rausch
 
PPT - Three Stages Of Writing PowerPoint Presentation, Free Dow
PPT - Three Stages Of Writing PowerPoint Presentation, Free DowPPT - Three Stages Of Writing PowerPoint Presentation, Free Dow
PPT - Three Stages Of Writing PowerPoint Presentation, Free DowTye Rausch
 
Hi-Write Paper Highlighted Yellow Lines Writing Paper
Hi-Write Paper Highlighted Yellow Lines Writing PaperHi-Write Paper Highlighted Yellow Lines Writing Paper
Hi-Write Paper Highlighted Yellow Lines Writing PaperTye Rausch
 
How To Write A Science Fair Introduction - YouTube
How To Write A Science Fair Introduction - YouTubeHow To Write A Science Fair Introduction - YouTube
How To Write A Science Fair Introduction - YouTubeTye Rausch
 
Essay Examples For The ACT Test (PDF)
Essay Examples For The ACT Test (PDF)Essay Examples For The ACT Test (PDF)
Essay Examples For The ACT Test (PDF)Tye Rausch
 
Gingerbread Man Maps Writing Center, Gingerbrea
Gingerbread Man Maps Writing Center, GingerbreaGingerbread Man Maps Writing Center, Gingerbrea
Gingerbread Man Maps Writing Center, GingerbreaTye Rausch
 
Easy Short English Paragraph On Politics Wri
Easy Short English Paragraph On Politics WriEasy Short English Paragraph On Politics Wri
Easy Short English Paragraph On Politics WriTye Rausch
 
How To Write A Critical Essay - Tips Examples
How To Write A Critical Essay - Tips  ExamplesHow To Write A Critical Essay - Tips  Examples
How To Write A Critical Essay - Tips ExamplesTye Rausch
 
How To Write An Essay Transitions (With Worksheet) - YouTube Essay ...
How To Write An Essay Transitions (With Worksheet) - YouTube  Essay ...How To Write An Essay Transitions (With Worksheet) - YouTube  Essay ...
How To Write An Essay Transitions (With Worksheet) - YouTube Essay ...Tye Rausch
 
Reading Essay Example. Reading And Writing Essay
Reading Essay Example. Reading And Writing EssayReading Essay Example. Reading And Writing Essay
Reading Essay Example. Reading And Writing EssayTye Rausch
 
Strategies For Improving Handwriting - Make Take
Strategies For Improving Handwriting - Make TakeStrategies For Improving Handwriting - Make Take
Strategies For Improving Handwriting - Make TakeTye Rausch
 
Pay For Essay And Get The Best Paper You Need - How To Write For Free ...
Pay For Essay And Get The Best Paper You Need - How To Write For Free ...Pay For Essay And Get The Best Paper You Need - How To Write For Free ...
Pay For Essay And Get The Best Paper You Need - How To Write For Free ...Tye Rausch
 
A Literature Review Of Lesson Study In Initial Teacher Education
A Literature Review Of Lesson Study In Initial Teacher EducationA Literature Review Of Lesson Study In Initial Teacher Education
A Literature Review Of Lesson Study In Initial Teacher EducationTye Rausch
 
An Investigation Of The Effects Of Citation Instruction To Avoid Plagiarism I...
An Investigation Of The Effects Of Citation Instruction To Avoid Plagiarism I...An Investigation Of The Effects Of Citation Instruction To Avoid Plagiarism I...
An Investigation Of The Effects Of Citation Instruction To Avoid Plagiarism I...Tye Rausch
 
Academic Writing As Craft
Academic Writing As CraftAcademic Writing As Craft
Academic Writing As CraftTye Rausch
 

More from Tye Rausch (20)

PPT - THE PERSUASIVE ESSAY PowerPoint Presentation, Free Download - ID
PPT - THE PERSUASIVE ESSAY PowerPoint Presentation, Free Download - IDPPT - THE PERSUASIVE ESSAY PowerPoint Presentation, Free Download - ID
PPT - THE PERSUASIVE ESSAY PowerPoint Presentation, Free Download - ID
 
5 Tips For Writing A Good History Paper Complete D
5 Tips For Writing A Good History Paper Complete D5 Tips For Writing A Good History Paper Complete D
5 Tips For Writing A Good History Paper Complete D
 
Argumentative Essay Thesis Statement Examples
Argumentative Essay Thesis Statement ExamplesArgumentative Essay Thesis Statement Examples
Argumentative Essay Thesis Statement Examples
 
Child Labour Essay For School Students In English Essay On Child Labour
Child Labour Essay For School Students In English Essay On Child LabourChild Labour Essay For School Students In English Essay On Child Labour
Child Labour Essay For School Students In English Essay On Child Labour
 
Grinch Day Writing.Pdf Christmas Kindergarten, W
Grinch Day Writing.Pdf Christmas Kindergarten, WGrinch Day Writing.Pdf Christmas Kindergarten, W
Grinch Day Writing.Pdf Christmas Kindergarten, W
 
Scarecrow Craft And Writing Bulletin Board Writing Cra
Scarecrow Craft And Writing Bulletin Board Writing CraScarecrow Craft And Writing Bulletin Board Writing Cra
Scarecrow Craft And Writing Bulletin Board Writing Cra
 
PPT - Three Stages Of Writing PowerPoint Presentation, Free Dow
PPT - Three Stages Of Writing PowerPoint Presentation, Free DowPPT - Three Stages Of Writing PowerPoint Presentation, Free Dow
PPT - Three Stages Of Writing PowerPoint Presentation, Free Dow
 
Hi-Write Paper Highlighted Yellow Lines Writing Paper
Hi-Write Paper Highlighted Yellow Lines Writing PaperHi-Write Paper Highlighted Yellow Lines Writing Paper
Hi-Write Paper Highlighted Yellow Lines Writing Paper
 
How To Write A Science Fair Introduction - YouTube
How To Write A Science Fair Introduction - YouTubeHow To Write A Science Fair Introduction - YouTube
How To Write A Science Fair Introduction - YouTube
 
Essay Examples For The ACT Test (PDF)
Essay Examples For The ACT Test (PDF)Essay Examples For The ACT Test (PDF)
Essay Examples For The ACT Test (PDF)
 
Gingerbread Man Maps Writing Center, Gingerbrea
Gingerbread Man Maps Writing Center, GingerbreaGingerbread Man Maps Writing Center, Gingerbrea
Gingerbread Man Maps Writing Center, Gingerbrea
 
Easy Short English Paragraph On Politics Wri
Easy Short English Paragraph On Politics WriEasy Short English Paragraph On Politics Wri
Easy Short English Paragraph On Politics Wri
 
How To Write A Critical Essay - Tips Examples
How To Write A Critical Essay - Tips  ExamplesHow To Write A Critical Essay - Tips  Examples
How To Write A Critical Essay - Tips Examples
 
How To Write An Essay Transitions (With Worksheet) - YouTube Essay ...
How To Write An Essay Transitions (With Worksheet) - YouTube  Essay ...How To Write An Essay Transitions (With Worksheet) - YouTube  Essay ...
How To Write An Essay Transitions (With Worksheet) - YouTube Essay ...
 
Reading Essay Example. Reading And Writing Essay
Reading Essay Example. Reading And Writing EssayReading Essay Example. Reading And Writing Essay
Reading Essay Example. Reading And Writing Essay
 
Strategies For Improving Handwriting - Make Take
Strategies For Improving Handwriting - Make TakeStrategies For Improving Handwriting - Make Take
Strategies For Improving Handwriting - Make Take
 
Pay For Essay And Get The Best Paper You Need - How To Write For Free ...
Pay For Essay And Get The Best Paper You Need - How To Write For Free ...Pay For Essay And Get The Best Paper You Need - How To Write For Free ...
Pay For Essay And Get The Best Paper You Need - How To Write For Free ...
 
A Literature Review Of Lesson Study In Initial Teacher Education
A Literature Review Of Lesson Study In Initial Teacher EducationA Literature Review Of Lesson Study In Initial Teacher Education
A Literature Review Of Lesson Study In Initial Teacher Education
 
An Investigation Of The Effects Of Citation Instruction To Avoid Plagiarism I...
An Investigation Of The Effects Of Citation Instruction To Avoid Plagiarism I...An Investigation Of The Effects Of Citation Instruction To Avoid Plagiarism I...
An Investigation Of The Effects Of Citation Instruction To Avoid Plagiarism I...
 
Academic Writing As Craft
Academic Writing As CraftAcademic Writing As Craft
Academic Writing As Craft
 

Recently uploaded

CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)Dr. Mazin Mohamed alkathiri
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
à€­à€Ÿà€°à€€-à€°à„‹à€ź à€”à„à€Żà€Ÿà€Șà€Ÿà€°.pptx, Indo-Roman Trade,
à€­à€Ÿà€°à€€-à€°à„‹à€ź à€”à„à€Żà€Ÿà€Șà€Ÿà€°.pptx, Indo-Roman Trade,à€­à€Ÿà€°à€€-à€°à„‹à€ź à€”à„à€Żà€Ÿà€Șà€Ÿà€°.pptx, Indo-Roman Trade,
à€­à€Ÿà€°à€€-à€°à„‹à€ź à€”à„à€Żà€Ÿà€Șà€Ÿà€°.pptx, Indo-Roman Trade,Virag Sontakke
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 

Recently uploaded (20)

CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
à€­à€Ÿà€°à€€-à€°à„‹à€ź à€”à„à€Żà€Ÿà€Șà€Ÿà€°.pptx, Indo-Roman Trade,
à€­à€Ÿà€°à€€-à€°à„‹à€ź à€”à„à€Żà€Ÿà€Șà€Ÿà€°.pptx, Indo-Roman Trade,à€­à€Ÿà€°à€€-à€°à„‹à€ź à€”à„à€Żà€Ÿà€Șà€Ÿà€°.pptx, Indo-Roman Trade,
à€­à€Ÿà€°à€€-à€°à„‹à€ź à€”à„à€Żà€Ÿà€Șà€Ÿà€°.pptx, Indo-Roman Trade,
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 

A Phrase-Frame List For Social Science Research Article Introductions

  • 1. Accepted Manuscript A phrase-frame list for social science research article introductions Xiaofei Lu, Jungwan Yoon, Olesya Kisselev PII: S1475-1585(18)30115-2 DOI: 10.1016/j.jeap.2018.09.004 Reference: JEAP 692 To appear in: Journal of English for Academic Purposes Received Date: 10 March 2018 Accepted Date: 14 September 2018 Please cite this article as: Xiaofei Lu, Jungwan Yoon, Olesya Kisselev, A phrase-frame list for social science research article introductions, (2018), doi: Journal of English for Academic Purposes 10.1016/j.jeap.2018.09.004 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
  • 2. ACCEPTED MANUSCRIPT A phrase-frame list for social science research article introductions Xiaofei Lu a,*, Jungwan Yoon a, Olesya Kisselev a a Department of Applied Linguistics, The Pennsylvania State University, 234 Sparks Building, University Park, PA 16802, USA *Corresponding author. Tel.: +1 (814) 865-4692 E-mail addresses: xxl13@psu.edu (X. Lu), jxy204@psu.edu (J. Yoon), ovk103@psu.edu (O. Kisselev).
  • 3. ACCEPTED MANUSCRIPT A phrase-frame list for social science research article introductions Abstract This study aimed to contribute to recent corpus-based efforts in compiling lists of academic expressions by deriving a pedagogically useful list of phrase-frames for a specific part-genre, i.e. research article introductions, in six social science disciplines. A combination of corpus statistics was used to extract an initial set of phrase-frame candidates with adequate frequency, variant diversity, and range across disciplines. These candidates were then manually filtered in several steps to ensure their semantic completeness and pedagogical value. The resulting 370 five-word phrase-frames and 84 six-word phrase-frames were analyzed structurally and functionally. Evaluation of a random sample of 100 phrase-frames by a panel of academic writing instructors and student writers indicated that the overwhelming majority of the phrase-frames were considered pedagogically useful by either the instructors or the student writers, or both. The implications of the current study for academic formulaic language research and of the phrase-frame list compiled for academic writing pedagogy are considered. Keywords: Academic writing; Formulaic language; Phrase-frames; Research article introductions 1. Introduction In the past decades, many corpus-based studies have argued for the necessity of lists of academic formulaic expressions for students and teachers of English for Academic Purposes (EAP), explored the methodological issues involved in compiling such lists, and presented several pedagogically useful lists of different types of academic expressions (e.g. Ackerman & Chen, 2013; Biber et al., 2004; Martinez & Schmitt, 2012; Morley, 2015; Nattinger & DeCarrico, 1992;
  • 4. ACCEPTED MANUSCRIPT 2 Simpson-Vlach & Ellis, 2010). The EAP community has shown substantial interest in such lists and recognized the importance of their continued improvement and enrichment. One type of formulaic expression that is now increasingly seen as pedagogically relevant but that has not yet been systematically tackled in previous corpus-based endeavors is phrase-frames (hereafter p- frames), i.e. semi-fixed sequences that contain a variable slot that can be filled by different words e.g. the * of the study, where the open slot may be filled by aim, goal, and purpose, among others. In this study, we extend recent efforts in compiling pedagogically useful lists of academic formulaic expressions by deriving a list of p-frames frequently used in a corpus of a specific part- genre, i.e. research article (RA) introductions, in six social science disciplines. In doing so, we hope to contribute to the methodological discussion of the extraction and selection of candidate p- frames as well as the usefulness and feasibility of functional categorization of the p-frames for pedagogical applications. 1.1. Language as phraseology A growing body of research accumulated in the past two decades in corpus linguistics has significantly contributed to the contemporary understanding of “language as phraseology” (Hunston, 2002, p.137). Language as phraseology is an assumption that positions the phrase, not individual words, as a fundamental unit of meaning. This assumption has a long history in the field of language pedagogy: language teachers were, possibly, the first to grapple with practical issues of formulaicity in language (Herbst, 2011; Sinclair & Carter, 2004; Stubbs, 2009). Indeed, any attempt by the learner to achieve a degree of native-like language ability inevitably results in realization that there exist multiple conventionalized ways of stringing words together in a particular language that are unpredictable based on the traditionally understood “rules of grammar”.
  • 5. ACCEPTED MANUSCRIPT 3 The need to identify and, subsequently, teach conventionalized sequences was recognized as early as the 1930s. In fact, the coinage of the term collocation belongs to the English language teacher and researcher Harold E. Palmer. Palmer’s definition of a collocation as “a succession of two or more words that must be learnt as an integral whole and not pieced together from its component parts” (Palmer, 1933 as cited in Stubbs, 2009, p.17) was effectively adopted by John Sinclair, who took the term and the idea behind it to build a whole new field around it (Herbst, 2011; Johansson, 2011; Stubbs, 2009). The original notion of language as phraseology can be summarized along the lines of the Idiom Principle, which posits that “a language user has available to him or her a large number of semi- preconstructed phrases that constitute single choices, even though they might appear to be analyzable into segments” (Sinclair, 1991, p.110). Corpus linguistics has provided evidence that formulaicity is omnipresent in language, and that formulaic sequences are fundamental to the way language is stored, processed, acquired and used (Hunston, 2002; Wray, 2008). Suggesting that every conventionalized recurrent communicative function has a conventionalized linguistic form, corpus linguistics has also effectively created a repository of research that showed which and what types of multi-word sequences, or formulae, tend to appear in which or what types of language modalities, registers, and genres1 (e.g. Biber et al., 2004; Hyland, 2008; Simpson-Vlach & Ellis, 2010). Biber et al. (2004) reported systematic differences in the distribution of lexical bundles of different structures and functions across the registers of conversation, classroom teaching, textbooks, and academic prose. Along similar lines, Simpson-Vlach and Ellis (2010) found that while some formulas occur frequently in both academic speaking and writing, many formulas 1 Registers are textual varieties “associated with a particular situation of use (including particular communicative purposes)” (Biber & Conrad, 2009, p.6); examples include conversation, classroom teaching, and textbooks. Genres are “abstract, socially recognised ways of using language” (Hyland, 2007, p. 149); examples include research articles, book reviews, and conference abstracts.
  • 6. ACCEPTED MANUSCRIPT 4 occur primarily in one or the other. Based on these findings on mode and register variation in formulaic language use, many recent studies set out to analyze formulaic sequences in specific academic genres or part-genres, some with attention to inter-disciplinary variation. Hyland (2008), for example, analyzed a corpus of academic writing that consisted of research articles, dissertations, and theses, and systematically delineated variation in lexical bundle use among four disciplines. The understanding of language as phraseology and the insights from research into formulaic language use in different modalities, registers, and genres have impacted thinking on the teaching of academic language. In particular, it has been argued that incorporating analyses of formulaic sequences and their functions in specific modalities, registers, and genres of texts may improve the EAP writing curricula and learning outcomes (Coxhead & Byrd, 2007; Paltridge, 2004). 1.2. Pedagogically oriented lists of academic formulaic expressions Recognizing the value of lists of academic formulaic expressions for facilitating EAP learners’ analyses and acquisition of formulaic sequences, corpus linguistics has made systematic efforts in deriving various types of such lists from corpora of academic language, such as the Academic Formulas List (AFL; Simpson-Vlach & Ellis, 2010) and the Phrasal Expressions List (PHRASE List; Martinez & Schmitt, 2012). Comparable to the aims of lists of academic vocabulary (e.g. Coxhead, 2000; Gardner & Davies, 2013), the AFL and the PHRASE List both aimed to represent pedagogically relevant contiguous formulaic sequences (e.g. in the present study) to be incorporated into curricula and teaching materials. Both lists were based on large corpora of academic speech and writing, which included different types of texts such as research articles and textbooks in the written portion, and lectures and seminars in the spoken portion. Methodologically,
  • 7. ACCEPTED MANUSCRIPT 5 both teams approached the selection of possible formulas using a combination of quantitative corpus measures and qualitative analytical procedures, albeit with differential emphasis on them. Simpson-Vlach and Ellis (2010) set out to generate a “formula teaching worth” (FTW) score for each formula, i.e. a composite score comprised of frequency and mutual information (MI) that could predict human judgement of the teaching worth of the formula. To that end, the authors recruited judges with teaching and testing experience to rate a random sample of 108 formulas, using such criteria as “whether or not they thought the phrase constituted ‘a formulaic expression, or fixed phrase, or chunk’ ” and “whether or not they thought the phrase was ‘worth teaching, as a bona fide phrase or expression’ ” (p.496). They subsequently ran multiple regression analysis on the rated sample to derive beta coefficients of frequency and MI as predictors of human rater scores and then used those coefficients to generate FTW scores for all formulas. They claimed that the final AFL list contained, in theory, only pedagogically useful formulaic expressions. Martinez and Schmitt (2012) cast doubt on the validity of the measure, whereby only a small subset of data was analyzed by human raters. They questioned such “strict adherence to statistically derived phrase selection” (p.306) and suggested that subjective judgment by raters with solid teaching and testing backgrounds should be included as a determinant for item inclusion. They incorporated several core criteria for the manual selection of candidates for the PHRASE List, based largely on Wray’s (2008) criteria for formulaicity. Examples of their criteria included “Is the expression a Morpheme Equivalent Unit (MEU)?”, “Is the expression semantically transparent?”, etc. These criteria were applied to all expressions extracted from the British National Corpus. Martinez and Schmitt conceded that while such a methodology was “extremely time and labor intensive,” the resulting list was “clearly enhanced pedagogically” (p.310).
  • 8. ACCEPTED MANUSCRIPT 6 Cortes (2013) and Morley (2015) took the qualitative analytical dimension a step further by aligning academic expressions with rhetorical functions. Both studies also focused on more specific EAP genres or part-genres. Cortes (2013) extracted a list of lexical bundles from a corpus of RA introductions and matched them to rhetorical moves and steps (e.g. one of the major was matched to the step ‘claiming relevance of field’). Morley (2015) extracted a list of phrases from a corpus of postgraduate dissertations and organized them by communicative function, e.g. ‘introducing problems and limitations’. He operationalized phrases broadly as expressions of variable length that were deemed useful for a communicative function, including both bundle-style items such as “The paper fails to specify 
” and rather long items such as “Difficulties arise, however, when an attempt is made to implement the policy”. Taken together, existing corpus-based efforts in compiling lists of academic expressions all highlight the importance to consider both quantitative corpus statistics and qualitative analyses in identifying pedagogically relevant expressions. Meanwhile, these efforts have focused primarily on continuous formulaic expressions. In the current study, we expand this line of research by arguing that lists of academic formulaic expressions can be enriched with the inclusion of academic p-frames, a position which we explore in detail in the section below. While we leave the alignment of p-frames with rhetorical functions to a future study2, we note that studies such as Cortes’ (2013) and Morley’s (2015) hold much promise in providing EAP students with not simply a list of academic expressions but a repository of linguistic units coupled with their specific rhetorical functions. 2 The compilation of the p-frame list is a necessary first step toward matching p-frames to rhetorical functions, and 1) the motivation and methodology for deriving the p-frame list and the structural and functional analysis of the p-frames extracted and 2) the systematic alignment of p-frames with rhetorical functions both warrant in-depth discussion. We thus defer the reporting of results regarding frame-function alignment to a follow-up study.
  • 9. ACCEPTED MANUSCRIPT 7 1.3. Phrase-frames In our view, lists of academic formulaic expressions will be greatly enhanced by the inclusion of phrase-frames (Fletcher, 2006, 2011), i.e. multi-word sequences in which words form a ‘frame’ around a variable slot. The variants of a frame often form one or more semantically close or functionally similar clusters. For example, the variable slot in the * of the study may be filled in by purpose and goal as well as motivation and rationale. Many studies have underscored the importance of p-frames in the academic discourse. Biber (2009), for instance, found that academic writing relies heavily on discontinuous frames, while conversation relies more on continuous lexical sequences. Gray and Biber (2013) further showed that p-frames are more variable in academic writing than in conversation. These observations give us reason to believe that novice EAP writers stand to benefit from pedagogical resources that provide not only continuous set expressions such as on the one hand but also discontinuous frames and their variants. Teaching variable frames may allow teachers to introduce more language while lessening the cognitive demand on memory; at the same time, examining the variants that fill the variable slot may be a valuable exercise in understanding the degree of formulaicity of various constructions. To be fair, some of the lists reviewed above included some p-frames. For example, Simpson- Vlach and Ellis (2010) listed a few p-frames on the AFL (e.g. [a/large/the] number of), which they created by compounding multiple overlapping n-grams. Martinez and Schmitt (2012) also dealt with expressions “with a variable component” on the ad hoc basis, but they went a step further: when such an expression was noticed, they conducted “a careful follow-up search” in the corpus to “identify all variable forms of that expression” (p.312). However, the limited number of p- frames included in these lists were identified based on the procedure that extracts p-frames from overlapping high-frequency n-grams (or lexical bundles). Such a procedure has been
  • 10. ACCEPTED MANUSCRIPT 8 problematized by Gray and Biber (2013), who reported that numerous discontinuous sequences were not associated with lexical bundles. The extraction of p-frames thus necessitates a separate procedure from that for extracting lexical bundles. In a recent study, Cunningham (2017) explored the use of p-frames in 128 mathematics RAs. She identified 180 p-frames specific to the mathematics discipline using a combination of frequency, range, and keyness criteria. These p-frames were then analyzed structurally and functionally. In her methodological procedure, the minimum frequency of each p-frame variant was set to three. This procedure suffered the same limitation noted by Gray and Biber (2013), leaving less frequent p-frame variants unanalyzed and many meaningful p-frames with diverse but less frequent variants unidentified. The study’s singular focus on the mathematics discipline also calls for research on other disciplines. The argument for general, non-disciplinary academic word and phrase lists is largely well taken, since, after all, EAP courses do usually cater to students from diverse majors and specializations. Nevertheless, linguistic variation across registers, genres, and disciplines is well documented (Biber et. al, 1999; Hyland, 2008), and the need to “determine how meaning creation works” in subsets of language (i.e. specific registers, genres, and disciplines) “that show a specialized grammar and vocabulary” (Römer, 2010, p.308) has been called for. This need prompted us to investigate p-frames in a particular EAP part-genre, i.e. RA introductions, within a set of social science disciplines. Several studies reviewed above (e.g. Cortes, 2013; Cunningham, 2017; Morley, 2015) have generated valuable insights into formulaic language use in specific academic genres and disciplines. In line with such insights and keeping in mind the pedagogical imperative, we also hold the view that presenting lists of formulaic expressions for specific genres
  • 11. ACCEPTED MANUSCRIPT 9 aids in fulfilling the promise of genre pedagogy, which sees “real benefits for learners as they pull together language, content, and contexts” (Hyland, 2007, p.150). 1.4. Overview of the current study This study aims to add to recent corpus-based efforts in compiling lists of academic expressions by deriving a pedagogically useful list of p-frames for a specific part-genre, i.e. RA introductions, in six social science disciplines. To this end, we first identified a set of p-frames from a corpus of social science RA introductions and then subjected them to several manual filtering procedures. All p-frames included in the final list were analyzed structurally and functionally, and a subset was rated for pedagogical value by a panel of EAP writing instructors and student writers. In what follows, we detail our methodological procedure, present the results of the different steps of our procedure, and discuss the implications of our results for academic formulaic language research and academic writing pedagogy. 2. Methodology 2.1. Corpus The corpus used in the current study comprises 517,703 words of published RA introduction sections sourced from the Corpus of Social Science Research Articles (COSSRA), developed by our research team. COSSRA includes 600 RAs published in 2012-2016 in six social science disciplines (Anthropology, Applied Linguistics, Economics, Political Science, Psychology, and Sociology), with 100 RAs sampled from five journals in each discipline. The journals were selected based on their impact factors, with their representativeness confirmed by two experts in each discipline. For each journal, we first sampled 20 issues in the period 2012-2016 and
  • 12. ACCEPTED MANUSCRIPT 10 subsequently sampled one RA per issue. The introduction sections of the RAs were extracted and each saved as a plain text file. All files were manually checked for errors resulting from the conversion process, and information unnecessary for p-frame identification were eliminated, including parenthetical citations, footnotes, and mathematical formulas. While many RAs contain an introduction section with the heading ‘Introduction’, some start with an untitled section followed by a titled section. We included such untitled first sections in our data as their primary rhetorical function was similar to the titled introduction sections. Table 1 summarizes the number of tokens of the RA introductions in each discipline. The subcorpus of each discipline contained approximately comparable number of tokens, with the exception of Economics, which had lengthier introductions than other disciplines. While it would be ideal to compile a similar-sized subcorpus for each discipline, we opted to maintain an equal number of texts for each discipline. Table 1 Composition of the corpus of social science research article introductions. Discipline Texts Tokens Proportion Anthropology 100 76,263 14.7% Applied Linguistics 100 63,357 12.2% Economics 100 145,609 28.1% Political Science 100 78,291 15.1% Psychology 100 78,209 15.1% Sociology 100 75,974 14.6% Total 600 517,703 100% 2.2. Procedure
  • 13. ACCEPTED MANUSCRIPT 11 The procedure for developing, analyzing, and assessing the p-frame list involved four stages: initial candidate extraction, manual filtering, structural and functional analysis, and rater assessment. 2.2.1. Automatic extraction of p-frame candidates Several methodological issues needed to be considered in the initial candidate extraction stage. The first had to do with the approach to p-frame identification. Previous studies employed either the bundles-to-frame approach (e.g. Biber, 2009; Cunningham, 2017; Römer, 2010) or the fully inductive approach (e.g. Fuster-MĂĄrquez & Pennock-Speck, 2015; Grabowski, 2015; Gray & Biber, 2013). The former begins with identifying lexical bundles and then analyzes them to determine p-frames. As mentioned above, this approach would fail to capture the full range of variants of the p-frames identified and, as Gray and Biber (2013) noted, miss meaningful p-frames with highly diverse, infrequent variants. To avoid these limitations, the current study adopted the fully inductive approach, which identifies p-frames based on all continuous lexical sequences. The second issue had to do with p-frame length. We began by extracting 4-word frames following the practice in several previous studies (Fuster-MĂĄrquez & Pennock-Speck, 2015; Grabowski, 2015; Gray & Biber, 2013; Römer, 2010). However, our preliminary analysis showed that the majority of the 4-word frames were incomplete units or contained function words only (e.g. the * of the). The pedagogical relevance of such units appeared questionable. We therefore decided to focus on five- and six-word frames only. The increased length allowed us to identify p- frames that are semantically more complete and more specific to the part-genre of RA introductions.
  • 14. ACCEPTED MANUSCRIPT 12 Another important methodological consideration was to determine the optimal combination of cut-off points of p-frame frequency, the number of variants, and the number of texts and disciplines in which a p-frame appears. There exists a certain degree of arbitrariness among previous studies in threshold setting. For example, depending on corpus size, researchers have set the frequency threshold from 10 occurrences per million words (PMW) (e.g. Simpson-Vlach & Ellis 2010) to 40 occurrences (e.g. Biber et al. 2004). To establish the ideal cut-off points appropriate for the size of our corpus, we conducted an explorative investigation using a range of threshold combinations. As the result of this investigation, we settled on five-word p-frames with at least 16 occurrences PMW and six-word p-frames with at least 12 occurrences PMW; additionally, each p-frame should have two or more variants and should occur in three or more texts across two or more disciplines. After all the necessary decisions were made, we extracted five- and six-word p-frames using kfNgram (Fletcher, 2011). For each p-frame, kfNgram provides its token count, a list of its variants, and the token count for each variant. The initial candidate p-frames extracted included all possible p-frames with a single variable slot in any position. However, we decided to discard p-frames with an initial variable slot as most of them crossed phrasal or clausal boundaries (e.g. * a growing body of with variants of to, so, reasoning, etc.). With these p-frames excluded, the initial candidate list included 594 five-word p-frames and 167 six-word p-frames. 2.2.2. Manual filtering The candidate list required manual scrutiny to exclude p-frames that were not meaningful or pedagogically relevant. This involved a considerable amount of concordance analysis using Antconc 3.5.0. (Anthony, 2017), in which each p-frame was examined in its original contexts of use and filtered using three criteria. First, frames which were linguistically incomplete (e.g. of the
  • 15. ACCEPTED MANUSCRIPT 13 paper is *) or crossed clausal boundaries (e.g. organized as follows the * section) were excluded. Second, frames which could be subsumed under larger frames were discarded. For example, the article is * as was removed as it was part of the article is * as follows. Finally, frames which could be better represented as a lexical bundle (e.g. on the one hand *, where the slot can be filled by virtually any word) were also excluded. To avoid researcher bias, only items marked as “exclude” by all researchers were excluded. This procedure resulted in a final list of 370 five-word p-frames and 84 six-word p-frames. 2.2.3. Structural and functional classification The final entries underwent structural and functional analyses based on Gray and Biber’s (2013) structural taxonomy and Simpson-Vlach and Ellis’ (2010) functional taxonomy. It was our hope that information on the structure and function of each p-frame would enhance the usefulness of the list for EAP teachers and students. Gray and Biber (2013) suggested three structural categories: (a) verb-based frames (frames containing one or more verb, e.g. must be * to); (b) other-content-word frames (frames containing one or more content words except verbs, e.g. on the * hand); and (c) function-word frames (frames containing only function words, e.g. the * of this). Simpson-Vlach and Ellis’ (2010) functional taxonomy, adapted from that of Biber et al.’s (2004), also posits three primary categories—referential, stance, and discourse expressions—with several levels of sub-categories. We analyzed the frames based on the primary and second-level categories only (see Section 3.2), as many of the more fine-grained functions were not applicable to RA introductions. This classification required a substantial amount of concordance analysis. For each occurrence of a p-frame, we determined its functional category based on the semantics of the
  • 16. ACCEPTED MANUSCRIPT 14 variant and the context in which it occurred. This approach inevitably resulted in some p-frames being identified as multi-functional. For example, in this * it is was labeled as both ‘referential’ and ‘discourse-organizing’, since it contained some variants referring to research context (e.g. context and setting) and others to textual elements (e.g. essay and study). While this variant-based approach to functional analysis has been commonly adopted by previous researchers (e.g. Fuster-MĂĄrquez & Pennock-Speck, 2015; Römer, 2010), some researchers have raised concerns regarding its contingent nature. Grabowski (2015) proposed a fixed-frame-based approach that assigns functional labels to p-frames based on “the nature of their fixed components rather than the semantics of slot-fillers and/or longer chunks of texts within a given p-frame” (p.271), arguing that p-frames and lexical bundles are distinctive constructs and p- frames can be functionally analyzed independently from their textual realizations. Recognizing the rationale and value of the fixed-frame-based approach, we see the variant-based approach as well suited for the purposes of the current study, as the discourse functions of the textual realizations of the p-frames would likely prove useful for helping EAP writers acquire contextually appropriate uses of those variants. 2.2.4. Instructor and student writer review Before finalizing the list, we solicited reviews of a subset of the list from two experienced academic writing instructors and two student writers enrolled in the MA TESL program at a large public university in the U.S. While previous studies that integrated rater assessment of the pedagogical value of their lists all relied solely on expert or teacher perspectives (e.g. Ackerman & Chen, 2013; Simpson-Vlach & Ellis, 2010), we considered it informative to include learner perspectives as well to ensure the list’s potential usefulness to both EAP teachers and learners. For
  • 17. ACCEPTED MANUSCRIPT 15 the review, the raters were provided with a random sample of 50 five-word p-frames and 50 six- word p-frames; for each p-frame, they also received information on its frequency, the number of variant types, the actual variants, and the number of texts and disciplines in which it occurred. They were then asked to rate each p-frame using the following four-point Likert scale: 1= pattern not recognizable; frame not useful 2= pattern recognizable; frame not useful 3= pattern recognizable; frame somewhat useful 4= pattern recognizable; frame very useful In our view, the pedagogical value of the p-frames on the list will likely vary substantially depending on the level of experience and expertise of the academic writer. As such, rather than using the ratings of a small group of instructors and student writers to include and exclude specific p-frames, we were more interested in obtaining a preliminary sense of the proportion of the p- frames on the list that may be pedagogically useful to either EAP instructors or learners, or both. Thus, in analyzing the results of the ratings, we considered a p-frame to be useful if it earned a total score of 5 (or an average score of 2.5) from either the instructor group or the learner group. We note that our criterion of 5 points out of a maximum of 8 was more stringent than the criterion of 9 points out of a maximum of 24 adopted by Ackerman and Chen (2013) in rating the pedagogical usefulness of academic collocations. In total, 91 p-frames (91%) received a total score of 5 or more by either one or both groups, indicating that the overwhelming majority of the p- frames on the list may be considered useful by either EAP instructors or learners, or both. The p- frames that were scored under 5 by both groups were mostly bundle-like frames (e.g. play an
  • 18. ACCEPTED MANUSCRIPT 16 important role in *, with somewhat incoherent variants such as coordinating, modifying, etc.). This observation was also supported by comments from the reviewers. For example, one teacher reviewer noted that “I think play an important role is good, but question the role of the p-frame in helping students develop their ideas further.” 3. Results This section presents the results of the structural and functional analysis of the final, filtered list of p-frames extracted from the corpus of social science RA introductions. The first subsection details the structural categorization of the p-frames based on Gray and Biber’s (2013) taxonomy, and the second the functional categorization based on Simpson-Vlach and Ellis’ (2010) taxonomy. More emphasis is placed on the functional analysis, given its greater importance in pedagogy. Our analysis shows clear differences in both structure and function between five-word and six-word p- frames. 3.1. Structural categorization Table 2 summarizes the distribution of five-word and six-word p-frames by structure. The majority of five-word p-frames are other-content-word based (64.3%, n = 238), followed by verb- based frames (28.9%, n = 107). Only a small proportion of five-word p-frames consist entirely of function words (6.8%, n = 25), partly because many function-word frames were found to be part of a six-word p-frame and therefore removed in the manual filtering stage. For six-word p-frames, verb-based frames account for a larger proportion (54.8%, n = 46) than other-content-word frames (45.2%, n = 38). No six-word p-frame consists of function words only. The differences in the
  • 19. ACCEPTED MANUSCRIPT 17 distribution of five-word and six-word p-frames may not be surprising, given the increased likelihood to encounter verbs or other content words in longer sequences. Table 2 Distribution of the p-frames by structure. Length Verb-based frames Other-content-word frames Function-word frames Total Five-word 107 (28.9%) 238 (64.3%) 25 (6.8%) 370 Six-word 46 (54.8%) 38 (45.2%) 0 (0.0%) 84 All 153 (33.7%) 276 (60.8%) 25 (5.5%) 454 Some examples of p-frames in each structural category are presented below. The words in square brackets indicate the variants that fill the open slot in each frame. a. Verb-based frames: we find [little, no, strong, suggestive, weak] evidence that the [aim, purpose, goal, objective] of this article is b. Other-content-word frames: a brief [account, description, overview, reminder, review] of the in the present study we [investigated, examine(d), focus, test(ed)] c. Function-word frames: one of the most [basic, common, fundamental, important, prevalent, significant] the [degree, extent, height, spread] to which the
  • 20. ACCEPTED MANUSCRIPT 18 3.2. Functional categorization Table 3 summarizes the distribution of five-word and six-word p-frames by primary function. For five-word p-frames, referential frames make up the largest category (55.1%, n = 204), followed by stance frames (19.5%, n = 72) and discourse organizing frames (18.9%, n = 70). For six-word p-frames, however, discourse organizing frames account for the largest proportion (64.3%, n = 54), followed by referential frames (15.5%, n = 13) and stance frames (13.1%, n = 11). A small proportion of five-word (6.5%, n = 24) and six-word (7.1%, n = 6) p-frames were found to be multifunctional, with their functions vary depending on the variants. Table 3 Distribution of the p-frames by primary function. Length Referential Stance Discourse Multifunction Total Five-word 204 (55.1%) 70 (18.9%) 70 (18.9%) 26 (7.0%) 370 Six-word 13 (15.5%) 11 (13.1%) 54 (64.3%) 6 (7.1%) 84 All 217 (47.8%) 81 (17.8%) 124 (27.3%) 32 (7.0%) 454 In the rest of this section, we present some examples of p-frames in different primary and second-level functional categories and discuss how they are used in context in social science RA introductions. Entries with substantial overlap in terms of structure, function, and variants are collapsed to capture their commonality, when doing so does not lose important details about the structure, function, and variants of individual p-frames. For example, the p-frames the aim of this *, the purpose of this *, the aim of the *, and the purpose of the * are collapsed into the aim/purpose of this/the *. Due to space constraint, only one p-frame along with some of its most frequent
  • 21. ACCEPTED MANUSCRIPT 19 variants is provided to illustrate each functional category. The full list of p-frames and their complete variants are provided in Appendix A. 3.2.1. Referential p-frames As shown in Table 3, overall, referential p-frames make up the largest category. Table 4 summarizes the proportions of p-frames in the five subcategories of referential p-frames in Simpson-Vlach and Ellis’ (2010) taxonomy. We did not find frames functioning as vagueness markers (i.e. phrases indicating imprecise reference, e.g. and so on) in our data. The largest functional subcategory for all frames was specification of attributes. A p-frame in this subcategory identifies specific attributes of a following nominal or clause, as illustrated in a.1. Table 4 Subcategories of referential p-frames. Length Specification of Attributes Identification and Focus Contrast and Comparison Deictics and Locatives Vagueness Markers Total Five- word 164 (80.4%) 18 (8.8%) 10 (4.9%) 12 (5.9%) 0 (0.0%) 204 Six- word 10 (76.9%) 0 (0.0%) 2 (15.4%) 1 (7.7%) 0 (0.0%) 13 All 174 (80.2%) 18 (8.3%) 12 (5.5%) 13 (6.0%) 0 (0.0%) 217 a. Referential p-frames a.1. Specification of attributes, e.g. the presence or absence of [data, information, feature] Ex. 1. In both experiments, we manipulate the presence or absence of [information] intended to trigger 

  • 22. ACCEPTED MANUSCRIPT 20 Ex. 2. Moreover, our reliance on the presence or absence of [data] from a long-running data series provides greater coverage 
 Identification and focus was the second largest subcategory of referential expressions. In RA introductions identification and focus frames either introduce the focus of previous literature or establish the focus area of one’s own study, as illustrated in a.2. a.2. Identification and focus, e.g. focus(-ing, -ed, -es) on the [consequences, effect(s), efficacy, impact, implications] of Ex. 3. In contrast, I focus on the [effects] of liquidity constraints on the extensive margin 
 Ex. 4. Studies of domestic courts usually focus on the [role] of courts in serving as deciders of contentious issues. Contrast and comparison frames are relatively small in number. Many frames in this category are used to introduce one’s own research in relation to previous literature, sometimes highlighting its unique focus, as illustrated in a.3. a.3.Contrast and comparison, e.g. is [also, clearly, closely, inherently, positively] related to the Ex. 5. This article is [also] related to the literature on savings, growth, and investment. Ex. 6. My policy analysis is [closely] related to the personnel economics literature on incentive contracts ...
  • 23. ACCEPTED MANUSCRIPT 21 Deictic and locative frames are also small in number. Such p-frames are often used to provide contextual information of one’s research site or to contextualize one’s own research in a specific time period or location relative to previous research, as illustrated in a.4. a.4. Deictic and locative, e.g. at the [beginning, end, start, time] of the Ex. 7. Notably, the turn to the corporeal at the [end] of the twentieth century has had a salutary effect 
 Ex. 8. I find that Christian and Islamic communities had, at the [time] of the survey, the most positive impact on respect for religious freedom in Ibadan ... 3.2.2. Stance p-frames Stance expressions provide a means for conveying one’s attitude, perspective, or position toward an event, action, or a proposition. Simpson-Vlach and Ellis (2010) suggested six subcategories of stance expressions, namely, hedges, epistemic stance, expressions of ability and possibility, evaluation, obligation and directive, and intention/volition and prediction. However, we did not find frames in the last two subcategories in our data (Table 5). Table 5 Subcategories of stance p-frames. Length Hedges Epistemic Ability Evaluation Obligation Intention Total Five-word 13 (18.6%) 19 (27.1%) 3 (4.3%) 35 (50.0%) 0 (0.0% ) 0 (0.0% ) 70 Six-word 1 (9.1%) 6 (54.5%) 0 (0.0%) 4 (36.4%) 0 (0.0% ) 0 (0.0% ) 11 All 14 (17.3%) 25 (30.9%) 3 (3.7%) 39 (48.1%) 0 (0.0%) 0 (0.0% ) 81
  • 24. ACCEPTED MANUSCRIPT 22 Expressions in the hedges subcategory are known to play a crucial role in academic writing as they allow writers to express uncertainty regarding the truth value of their statements, enabling them not only to show modesty and reservation but also to avoid personal accountability (Hyland, 1994). In our data, hedges were expressed often through adjectives introduced by the copula be, as illustrated in b.1. b. Stance p-frames b.1. Hedges, e.g. are [less, relatively, more, not, also] likely to be Ex. 9. Instead, we have shown that as oil wealth rises, autocracies are [less] likely to be ousted by groups that would initiate new dictatorships
 Ex. 10. Most importantly, empirical research has repeatedly shown that evangelical Protestants are [relatively] likely to be lower class. Epistemic stance frames are somewhat similar to hedges in that they also include expressions of uncertainty. However, such frames have more to do with “knowledge claims or demonstrations” and “reports of claims by others” (Simpson-Vlach & Ellis, 2010, p.506), as illustrated in b.2. b.2. Epistemic stance, e.g. may or may not be [useful, protective] Ex. 11. ... they may instead do so in their first language, which may or may not be [useful] in helping them develop literacy in their L2. Ex. 12. However, perceived control may or may not be [protective] against mortality 

  • 25. ACCEPTED MANUSCRIPT 23 The ability and possibility frames express or introduce some possible action or proposition. In social science RA introductions, the ability and possibility frames are often used to justify or rationalize research focus or design, as illustrated in b.3. b.3. Ability and possibility, e.g. allows us to [address, assess, explore, investigate, measure, observe, study] the Ex. 13. The time-series dimension allows us to [address] the potential endogeneity of network ties. Ex. 14. This allows us to [explore] the relationship between non-standard work hours and fertility decisions from different perspectives 
 The evaluation category formed the largest group of stance p-frames. P-frames in this category are often used to evaluate one’s own or others’ research through evaluative adjectives, as illustrated in b.4. b.4. Evaluation, e.g. it is important to [note, emphasize, underscore, acknowledge] that Ex. 15. It is important to [note] that turning points may vary in valence (negative or positive), severity, and duration across individuals. Ex. 16. Likewise, it is important to [emphasize] that our experiment only studies a small sampling of the many decision environments
 3.2.3 Discourse organizing frames
  • 26. ACCEPTED MANUSCRIPT 24 Discourse organizing frames, the second largest group in our list, served four main functions following Simpson-Vlach and Ellis’ (2010) taxonomy: metadiscourse, topic introduction, topic elaboration, and discourse markers. The first subcategory, metadiscourse and textual reference includes frames that seem to be genre-specific, signaling the outline of the article, as illustrated in c.1. Table 6 Subcategories of discourse organizing p-frames. Length Metadiscourse Topic introduction Topic elaboration Discourse markers Total Five-word 33 (47.1%) 13 (18.6%) 23 (32.9%) 1 (1.4%) 70 Six-word 32 (59.3%) 20 (37.0%) 1 (1.9%) 1 (1.9%) 54 All 65 (52.4%) 33 (26.6%) 24 (19.4%) 2 (1.6%) 124 c. Discourse organizing frames c.1. Metadiscourse, e.g. the article/paper is [organized, structured] as follows Ex. 17. The article is [structured] as follows: first, the literature on learners’ cognitive processes in L2 pragmatics research is reviewed 
 Ex. 18. The paper is [organized] as follows: Section 2 presents the social choice environment. The subcategory of topic introduction and focus signals the topic or the goal of the research. This category, as Simpson-Vlach and Ellis (2010) noted, functionally overlaps with the identification and focus category under referential expressions to some degree. The main
  • 27. ACCEPTED MANUSCRIPT 25 difference between the two is that the topic introduction and focus frames serve more “global discourse organizing function of introducing a topic,” as illustrated in c.2, whereas the identification and focus frames have more to do with “local referential function of identification” (p.507). c.2. Topic introduction and focus, e.g. the primary [purpose, goal, aim, objective, contribution] of this study/article/paper Ex. 19. The primary [purpose] of this study was to classify the regime types for twenty- four countries in the Americas ... Ex. 20. Accordingly, the primary [goal] of this study is to test and extend the metatheoretical framework proposed by Ferris and colleagues 
 The topic elaboration subcategory relates to explicating and elaborating a topic previously introduced. Many frames in this category include phrases signaling a cause/reason and effect relationship, as illustrated in c.3. c.3. Topic elaboration, e.g. to [assess, estimate, evaluate, examine, explore, measure, study, test] the effect(s) of Ex. 21. Hypotheses are developed to [evaluate] the effects of item positioning on response behavior under the three mechanisms. Ex. 22. The test was designed to [estimate] the effect of paid search on sales ...
  • 28. ACCEPTED MANUSCRIPT 26 Discourse markers generally serve to connect ideas smoothly and logically. Our list only contains a few p-frames functioning as discourse markers (e.g. c.4), partially because many of them were considered to be better represented as lexical bundles rather than p-frames (e.g. on the other hand *) and thus removed at the manual filtering stage. c.4. Discourse markers, e.g. in addition to the [literature, amount, outcome, above-mentioned] Ex. 23. In addition to the [literature] on contract theory and mechanism design with limited commitment, our analysis is related to two other strands of literature ... Ex. 24. In addition to the [above-mentioned] reasons, an early start of FL learning has been uncritically accompanied by expectations of superior L2 outcomes ... The functional categories of the p-frames discussed above are not meant to be taken as definitive and exclusive, as some have multiple functions, but rather as indications of the most salient function they tend to fulfill in social science RA introductions. 4. Discussion The purpose of this study was to derive a pedagogically useful list of p-frames from a corpus of a specific part-genre, i.e. research article introductions, in six social science disciplines. This research aim was motivated by insights into the importance of formulaic sequences in academic English as well as the variation in formulaic language use across different registers, genres, and disciplines (e.g. Biber et al., 2004; Cortes, 2013; Coxhead & Byrd, 2007; Cunningham, 2017; Grabowski, 2015; Hyland, 2008), success of previous corpus-based efforts in compiling lists of academic formulaic expressions (e.g. Simpson-Vlach & Ellis, 2010; Martinez & Schmitt, 2012)
  • 29. ACCEPTED MANUSCRIPT 27 and the absence of academic p-frame lists, and the perceived value of p-frame lists for EAP writing pedagogy (Cunningham, 2017; Fletcher, 2006, 2011; Gray & Biber, 2013). The study thus serves to fill an important gap in academic formulaic language research. Recognizing the limitations of the bundles-to-frame approach to p-frame extraction (e.g. Cunningham, 2017; Römer, 2010), we argued for and adopted the fully inductive approach in which p-frames are identified based on all continuous lexical sequences, rather than just lexical bundles, found in the corpus (Fuster-MĂĄrquez & Pennock-Speck, 2015; Grabowski, 2015; Gray & Biber, 2013). A combination of corpus statistics was used to extract an initial set of p-frame candidates with adequate frequency, variant diversity, and range across disciplines using kfNgram (Fletcher, 2011). These candidates were then manually filtered in several steps to ensure their semantic completeness and pedagogical value. The resulting 370 five-word p-frames and 84 six- word p-frames were analyzed using Gray and Biber’s (2013) structural taxonomy and Simpson- Vlach and Ellis’ (2010) functional taxonomy. Overall, the majority of p-frames (60.8%) were other-content-word (excluding verbs) frames. While Cunningham (2017) used a different taxonomy (Biber et al., 1999) for structural analysis, it is clear that the majority of the p-frames identified from the corpus of mathematics research articles were verb-based. The functional analysis was performed using a variant-based approach (e.g. Fuster-MĂĄrquez & Pennock-Speck, 2015; Römer, 2010), which examines the functions of specific realizations of each p-frame in context, rather than a fixed-frame-based approach (Grabowski, 2015), which determines the function of each p-frame based on its fixed components. This was the case based on the consideration that the functional categories assigned to different variants of the p-frames in context may prove useful in helping EAP learners acquire contextually appropriate uses of the p-frames and their variants. The variant-based approach resulted in a subset of p-frames being categorized
  • 30. ACCEPTED MANUSCRIPT 28 in more than one functional category. Overall, referential p-frames accounted for the largest proportion (47.8%), but the majority of six-word p-frames (64.3%) were discourse organizing frames. The final list is organized by function and presented in Appendix A, with multifunctional p-frames listed under separate categories with the corresponding variants. Evaluation of a random sample of 100 p-frames by a panel of academic writing instructors and student writers indicated that the overwhelming majority (91%) of the p-frames on the list were considered pedagogically useful to either the instructors or student writers, or both. While most previous studies solicited review by experts only, such as instructors, testers, publishers, and lexicographers (e.g. Ackerman & Chen, 2013; Simpson-Vlach & Ellis, 2010), we considered it useful to include student or learner perspectives as well, as ultimately the list is intended to serve their learning needs. Previous studies also commonly used the review by a large panel of experts to eliminate candidate expressions that had already been filtered by the researchers (e.g. Ackerman & Chen, 2013; Martinez and Schmitt, 2012). While we acknowledge the benefits of having a larger panel of reviewers with diverse academic backgrounds, we also deem it difficult to adequately represent the pedagogical needs of all academic writers, given the many criteria that should be considered (e.g. L1 background, level of experience and expertise in academic writing, discipline, etc.). As such, the use of any panel’s judgment as absolute criteria to include or exclude candidate p-frames does not necessarily constitute the most optimal solution, as confirmed by our panel members. It may indeed be more productive to leave room for EAP teachers and learners to make their own judgment based on their specific pedagogical contexts and learning needs. Some overlap exists between p-frames and other types of academic expressions. For example, complete variants of the p-frames, when examined individually, are reminiscent of formulas and phrases, and the fillers of the p-frames may remind one of collocations. However, different from
  • 31. ACCEPTED MANUSCRIPT 29 individual formulas and phrases, p-frames provide information about patterns and their variability, and different from collocations presented in isolation, p-frames contextualize co-occurring words in syntactic patterns. As such, the p-frame list constitutes a useful addition to existing lists of academic vocabulary (Coxhead, 2000; Gardner & Davies, 2013), collocations (Ackerman & Chen, 2013), and continuous formulaic sequences (Martinez and Schmitt, 2012; Morley, 2015; Simpson- Vlach & Ellis, 2010) reviewed earlier. 5. Conclusion The current study has extracted, analyzed, and evaluated a pedagogically useful list of p-frames from a corpus of social science research article introductions. Focusing on six social science disciplines, this study sits somewhere in the middle in the discipline specificity continuum. While we did not focus on inter-disciplinary variation, the identification of p-frames that are uniquely useful in specific disciplines would certainly constitute a productive avenue of future research (e.g. Cunningham, 2017). Additionally, although the p-frames identified in our study all occur in at least two disciplines, the p-frame variants may be analyzed in terms of their specificity to individual disciplines (e.g. Fuster-MĂĄrquez, 2014; Grabowski, 2015). The focus on the specific part-genre of RA introductions may be both a limitation and strength. On the one hand, it limits the pedagogical value of the list compiled primarily to this part-genre. On the other hand, awareness of linguistic variation across genres and part-genres is critical to the development of EAP learners’ genre competence, and resources targeting high-stake genres and part-genres such as RA introductions will prove valuable for pedagogy aimed at promoting that awareness (Cortes, 2013; Hyland, 2007). The research outcome generated in this study also paves
  • 32. ACCEPTED MANUSCRIPT 30 the way for our ongoing research on aligning p-frames with rhetorical moves and steps in RA introductions, and on identifying p-frames for other RA part-genres. The pedagogical applications of the p-frame list compiled need to be carefully considered. For EAP courses that take a genre approach to teaching academic writing, the p-frame list can serve as a useful resource for assisting students’ analysis of language features that characterize RAs. However, in our view lists of different types of academic expressions should best be used in an integrative way to maximize their potential for promoting students’ genre competence. For example, as students identify important collocations in RAs, the p-frame list can be a handy tool to help them see the range of contexts or syntactic environments in which they occur in RAs. Similarly, as students notice formulas that are frequently used in RAs, the p-frame list can help them see patterns that such formulas fit in as well as related variants that they could use. We also expect the list to serve as one of many useful reference tools to novice social science scholars as they engage in RA writing. Our future research will investigate the pedagogical uses of this list in genre-based academic writing classrooms and actual RA writing contexts. We also call for more empirical research examining the feasibility and effectiveness of integrative pedagogical applications of the different types of academic formulaic expression lists to validate existing lists, identify best practices in using them, and inform future efforts in compiling new lists. Appendix A. The complete phrase-frame list for social science research article introductions The complete list of p-frames can be found at http://www.personal.psu.edu/xxl13/download.html. References
  • 33. ACCEPTED MANUSCRIPT 31 Ackerman, K., & Chen, Y-H. (2013). Developing the Academic Collocation List (ACL) – A corpus-driven and expert-judged approach. Journal of English for Academic Purposes, 12, 235-247. Anthony, L. (2017). AntConc (Version 3.5.0) [Computer Software]. Tokyo, Japan: Waseda University. Available from http://www.laurenceanthony.net/software Biber, D. (2009). A corpus-driven approach to formulaic language in English. International Journal of Corpus Linguistics, 14, 275-311. Biber, D., & S. Conrad. (2009). Register, genre, and style. Cambridge: Cambridge University Press. Biber, D., Conrad., & Cortes, V. (2004). If you look at ...: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25, 371-405. Biber, D., Leech, G., Johansson, S., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. London: Longman. Cortes, V. (2013). The purpose of this study is to: Connecting lexical bundles and moves in research article introductions. Journal of English for Academic Purposes, 12, 33-43. Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213-238. Coxhead, A., & Byrd, P. (2007). Preparing writing teachers to teach the vocabulary and grammar of academic prose. Journal of Second Language Writing, 16, 129-147. Cunningham, K. J. (2017). A phraseological exploration of recent mathematics research articles through key phrase frames. Journal of English for Academic Purposes, 25, 71-83. Fletcher, W. H. (2006). “Phrases in English” Home. Available from http://phrasesinenglish.org/ Fletcher, W. H. (2011). KfNgram. Annapolis, MD: USNA. Fuster-MĂĄrquez, M. (2014). Lexical bundles and phrase frames in the language of hotel websites.
  • 34. ACCEPTED MANUSCRIPT 32 English Text Construction, 7, 84-121. Fuster-MĂĄrquez, M., & Pennock-Speck, B. (2015). Target frames in British hotel websites. International Journal of English Studies, 15, 51-69. Gardner, D., & Davies, M. (2013). A new academic vocabulary list. Applied Linguistics, 35(3), 305-327. Grabowski, Ɓ. (2015). Phrase frames in English pharmaceutical discourse: A corpus-driven study of intradisciplinary register variation. Research in Language, 13, 266-291. Gray, B., & Biber, D. (2013). Lexical frames in academic prose and conversation. International Journal of Corpus Linguistics, 18, 109-136. Herbst, T. (2011). Choosing sandy beaches – Collocations, probabemes and the idiom principle. In T. Herbst, S. Faulhaber, & P. Uhrig (Eds.), The phraseological view of language (pp. 27- 57). Berlin: Walter de Gruyter. Hunston, S. (2002). Corpora in applied linguistics. Cambridge: Cambridge University Press. Hyland, K. (1994). Hedging in academic writing and EAP textbooks. English for Specific Purposes, 13, 239-56. Hyland, K. (2007). Genre pedagogy: Language, literacy and L2 writing instruction. Journal of Second Language Writing, 16, 148-164. Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27(1), 4-21. Johansson, S. (2011). Corpus, lexis, discourse: a tribute to John Sinclair. In T. Herbst, S. Faulhaber, & P. Uhrig (Eds.), The phraseological view of language, (pp. 7-26). Berlin: Walter de Gruyter. Martinez, R., & Schmitt, N. (2012). A phrasal expressions list. Applied Linguistics, 33, 299-320. Morley, J. (2015). The Academic Phrasebank: An academic writing resource for students and
  • 35. ACCEPTED MANUSCRIPT 33 researchers. Manchester, UK: The University of Manchester. Nattinger, J. R., & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford: Oxford University Press. Paltridge, B. (2004). Academic writing. Language Teaching, 37, 87-105. Römer, U. (2010). Establishing the phraseological profile of a text type: The construction of meaning in academic book reviews. English Text Construction, 3, 95-119. Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list: New methods in phraseology research. Applied linguistics, 31, 487-512. Sinclair, J. McH. (1991). Corpus concordance collocation. Oxford: Oxford University Press. Sinclair, J. McH., & Carter, R. (Eds.) (2004). Trust the text: Language, corpus and discourse. London/New York: Routledge. Stubbs, M. (2009). Technology and phraseology: With notes on the history of corpus linguistics. In U. Römer & R. Schulze (Eds.), Exploring the lexis-grammar interface, (pp. 15-31). Amsterdam/Philadelphia: John Benjamins. Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford: Oxford University Press.
  • 36. ACCEPTED MANUSCRIPT Xiaofei Lu is Associate Professor of Applied Linguistics and Asian Studies at The Pennsylvania State University, where he directs the graduate programs in the Department of Applied Linguistics. His research interests are primarily in corpus linguistics, intelligent computer-assisted language learning, English for Academic Purposes, and second language writing. He is the author of Computational Methods for Corpus Annotation and Analysis (2014, Springer). Jungwan Yoon is a Ph.D. candidate in the Department of Applied Linguistics at The Pennsylvania State University. Her research interests include academic literacy development, second language writing, corpus linguistics, and discourse analysis. Olesya Kisselev is a Ph.D. candidate in the Department of Applied Linguistics at The Pennsylvania State University. Before coming to Penn State, she was an instructor and curriculum developer in the Russian Flagship Program at Portland State University. Her research interests include corpus linguistics and discourse analysis, especially as they apply to the study of various aspects of second language and heritage language acquisition.