SlideShare a Scribd company logo
1 of 26
Download to read offline
DOI 10.1515/phras-2022-0005 YoP 2022; 13: 55–80
Laura Giacomini
The contextual behaviour of specialised
collocations: typology and lexicographic
treatment
Abstract: A corpus-based analysis of specialised phraseology can shed light on
the role of phrasal context in terminology. This contribution describes the behav-
iour of constituents of simple and complex specialised collocations in technical
texts and the way in which these are distributed in different contexts ranging from
the immediate surroundings of a node to the inclusion of a much larger portion
of text. The contextual behaviour of specialised collocations is exemplified by
English and Italian terms in the domain of photovoltaic technology. This contri-
bution aims to identify and classify specialised collocations along their formation
modalities and contexts, as well as to discuss the impact of these phenomena on
their representation in LSP lexicographic resources. Issues and options concern-
ing the extraction of collocations and contexts are also addressed.
Keywords: phraseology, specialised collocations, complex collocations, technical
texts, LSP dictionary
1 Introduction
The present article deals with the topic of specialised collocations, with the aim of
discussing the distribution of constituents of collocations in technical texts and
the way in which this relates to a notion of context that ranges from the immediate
surroundings of a node to the inclusion of a much larger portion of text. In order
to do so, we will start with a rather broad concept of context as “a frame […] that
surrounds a [focal] event being examined and provides resources for its appropri-
ate interpretation” (Goodwin and Duranti 1992: 3; cf. also Goffman 1974 for the
notion of frame) but without indulging in further distinctions between context
and co-text (cf. Lyons 1995) of collocation constituents. In this way we want to
contribute to research on specialised phraseology by presenting an overview of
different types of collocational span (Evert 2009), by examining their interplay
Laura Giacomini, Innsbruck University, Innsbruck, Austria, Laura.Giacomini@uibk.ac.at
56 Laura Giacomini
with contextual properties and by discussing their impact on the (re)presentation
of specialised collocations in LSP lexicographic resources.
Despite being a comparatively underrepresented topic in terminology
research, the study of specialised phraseology can provide valuable insight into
the role of phrasal context in terminology (cf. Sinclair and Carter 2004 on the
phrasal nature of language). As pointed out in Giacomini et al. (2020), complex
collocations in particular should be the focus of attention in general and spe-
cialised lexicography because of their high productivity and key phraseological
significance (cf. also Gouws 2015: 184–185).
The contextual behaviour of specialised collocations will be exemplified by
English and Italian terms in the domain of photovoltaic technology. This domain
has a rich terminology which interfaces with several disciplines, making special-
ised collocations a varied phenomenon in their denotative content and termino-
logical composition. Section 2 provides an operational, global definition which
encompasses specialised collocations and multi-word terms and introduces a
typology of specialised collocations, ranging from simple collocations to differ-
ent forms of complex collocations. The typology reflects the formation processes
of shorter or longer collocational sequences in technical texts.
The contribution continues in Section 3 with the description of the English
and the Italian corpora and of the behaviour of specialised collocations in
context. The corresponding notion of context is inferred from the observation
of corpus data and involves all the distinct textual components (e.g. phrase and
sentence) in which the constituents of a specialised collocation may appear in
a corpus. Accordingly, a typology of contexts is presented and discussed – also
from the point of view of their identification in a corpus.
Section 4 explores possibilities of presentation for the different collocational
contexts in LSP lexicography and provides an entry draft for a dictionary on pho-
tovoltaic terminology. The entry structure covers all available information con-
cerning the contextual behaviour of the collocations of the lemma and proposes
a method for systematically ordering this information. The article concludes with
some reflections on the relevance of the results of the study and on the need to
continue to explore specialised collocations in further domains in order to fully
appreciate the role they play in shaping the context of specialised texts.
2 The nature of specialised collocations
In general, an exclusive orientation of LSP collocation theory towards the col-
location understanding typical of LGP phraseology appears problematic for a
The contextual behaviour of specialised collocations 57
number of reasons, among which is the distribution of idiomatic phrasemes1 in
specialised texts. The classification of phraseological data based on the prin-
ciple of ‘idiomaticity’ seems to play a different role in LSP phraseology, where
specialised collocations are often assigned to the set of non-idiomatic phrase-
mes (cf. Gläser 2007: 487). Idiomaticity, i.e. the gradual feature of phraseologi-
cal expressions that display a discrepancy between their literal and figurative
meaning (Burger 2015), has a different status in LSP than in LGP and has often
been described as a non-obligatory feature of specialised phrasemes (cf. Cedillo
2004: 91) in the same way as compositionality has been described as a non-cen-
tral property in terminology (L’Homme and Azoulay 2020: 153). This is supposed
to have a direct impact on the flexibility of the context in which phrasemes, col-
locations in particular, occur in specialised texts. The degree of ‘fixedness’ is
also a criterion for distinguishing different types of specialised phrasemes. It
certainly affects the contextual behaviour of the least fixed phraseological units,
namely specialised collocations, that widely lend themselves to paradigmatic
substitutions and syntagmatic modifications. Paradigmatic substitutions are
found, for example, in the frequent phenomenon of substantive class formation
(Cedillo 2004), i.e. it is often possible to identify, for a given noun in the colloca-
tion, a series of alternative nouns that are members of the same semantic field
(cf. example a):
a) current/overcurrent/lightning/shock/… protection
Syntagmatic modifiability concerns the ability of a collocation to change its struc-
ture by expanding it by means of new elements (cf. example b) and/or adapting it
to new inflectional forms (cf. example c).
b) current protection > reverse current protection
c) battery lead > battery leads
A further issue is the controversial status of complex terms, in particular multi-
word terms (as opposed to specialised collocations) and compounds. Some
authors trace a boundary between terminology and phraseology, ascribing to
multi-word terms a purely naming function (“La terminologie désigne des objets
et concepts alors que la phraséologie formule des relations.”, Gouadec 1994: 173;
cf. also Gläser 2007: 494), whereas the function of specialised collocations would
be the description of relations. This kind of strict duality also agrees with the view
of multi-word terms and specialised collocations as covering complementary
1 ‘Phraseme’ is used in this contribution as a synonym of ‘phraseological unit’ (cf. Corpas Pastor
and Colson 2020: 2).
58 Laura Giacomini
syntactic patterns, with the former as typical noun phrases (e.g. fault current, or
PV array mounting structure) and the latter as typical verb phrases (e.g. install-
ing a PV array or to route cables). However, as pointed out by Cedillo (2004)
and Giacomini (2021), separating word combinations based on their syntactic
features leads to inconsistencies when trying to explain the semantic equiva-
lence of variants of the kind V+(P)+N (protect against overcurrent) and N+(P)+N
(overcurrent protection or protection against overcurrent). Among complex terms,
compounds are, according to some researchers (cf. Burger 2015: 16), equally
problematic from a phraseological standpoint since they do not meet the fun-
damental criterion of polylexicality of phrasemes. A broader definition of ‘pol-
ylexicality’ as a combinatorial property of lexical morphemes, not exclusively
of words, would help overcome this obstacle and reflect the natural semantic
equivalence of some compounds, e.g. (sth is) roof-mounted, and collocations,
e.g. mount (sth) on the roof.
In the context of this study, we will start from an operational definition of
specialised collocation aimed at a lexicographic description of the phenomenon.
By specialised collocation we mean a combination of two or more words that is typical of a
specialised language, with unitary phraseological meaning and terminological character.
The terminological character of the collocation is independent of the terminological or non-
terminological character of its individual constituents. The degree of idiomaticity of the
specialised collocation is variable, as is the degree of its fixedness.
This is a comprehensive definition, which, relying on a phraseological but also
empirical notion of collocation (cf., among others, Evert 2009 and Bartsch 2004),
is aimed at providing a framework for the treatment of specialised collocations
in a lexicographic resource. It allows considerable flexibility in the treatment of
data extracted from corpora without precluding further restrictions, e.g. of multi-
word terms in the narrow sense.
2.1 Simple and complex specialised collocations: a typology
The notion of collocation as a binary combination is still very much entrenched
in both common language and special language phraseological studies. The
understanding of collocations as n-ary combinations has gained some popu-
larity on the basis of corpus evidence in the recent past (cf., among others,
Seretan 2013; Gouws 2015; Tutin and Kreif 2016) but the nature of complex
collocations is still largely unexplored. This is very limiting with respect to
the actual syntactic and semantic scope of the phenomenon, and it also con-
strains the possibility of carrying out adequate contextual analysis. This view is
The contextual behaviour of specialised collocations 59
likewise reflected in the treatment of specialised collocations in lexicographical
and terminographical resources, in which the focus is primarily on two-word
combinations.
In order to best analyse the context in which the constituents of a special-
ised collocation develop, it is necessary to introduce a typology of collocations
that is not tied to a specific number of constituents or to specific restrictions on
the context itself. The notion of possible context for specialised collocations will
be inferred from the data we are going to examine and will not be defined in
advance. By ‘simple specialised collocation’ (SSC) we mean a base collocation,
terminologically not further decomposable and usually consisting of two ele-
ments. As already pointed out, the constituents of any specialised collocation
may or may not be terms, without thereby affecting the terminological character
of their co-occurrence. Examples of SSCs in English are:
photovoltaic system		 poly-crystalline
junction box			grid-connected
system load			 to match the voltage
amount of energy		 to charge a battery
It is apparent from specialised texts, however, that also complex collocations play
a role in the phraseology of a language, although they have been only margin-
ally investigated in the past (cf. Giacomini et al. 2020). A study conducted in the
context of learner’s lexicography on general language complex collocations in
Italian and German has revealed the existence of two different kinds of complex
collocation formation (ibid.):
– The first type of complex collocation is built by recursive expansion. The
recursive nature of collocations, described by some researchers as the prop-
erty of collocation constituents to be collocational themselves (cf. Heid
1994; Seretan 2013), implies that a core collocational phrase is progressively
expanded by the addition of new collocates.
– The second type of complex collocation is built by argument complementa-
rity, i.e. the concatenation of simple collocations of a verb matching two or
more of its arguments. This type of formation is sometimes combined with
the first one, depending on the collocational range of the constituents of
simple collocations.
The hypothesis to be tested in this contribution is that the model developed in
the study on general language is transferable to specialised language and is
also valid for English. The application of the descriptive model of complex col-
locations to special language corpora will then serve to explore their contextual
60 Laura Giacomini
features. A ‘complex specialised collocation’ (CSC) will be defined as a spe-
cialised collocation derived from a SSC. Since its terminological value must by
definition be preserved in the evolution from a simple collocation, in special-
ised languages we usually find a lower number of complex collocations than
in general language, where the only valid criterion for defining a complex
collocation is syntactic, semantic and phraseological typicality. Based on the
previously mentioned formation types for complex collocations, we will distin-
guish also for LSP between ‘recursively built’ CSCs and ‘argument-related’ CSCs.
Table 1 summarises the profiles of the discussed phrasemes and provides some
examples in English and Italian.
Tab. 1: Types of simple and complex specialised collocations and examples in English and
Italian. For each CSC, the original SSC is shown, and the additional constituents in a) are
underlined
Specialised
collocation:
Main characteristics: English examples: Italian examples:
Simple specialised
collocation (SSC):
Not further
decomposable
collocation, usually
binary.
renewable energy
(SSC)
energia raggiante
(SSC)
Complex specialised
collocation (CSC):
a) recursively built CSC
b) argument-related
CSC
Expansion by the
addition of a collocate,
usually a modifier/
specification of a
constituent; can also
apply to a CSC.
Concatenated
collocations matching
the arguments of a
verb.
match the voltage
(SSC)  match the
nominal voltage (CSC)
 match the nominal
voltage of the solar
array (CSC)
PV system (SSC) 
performance of the PV
system (CSC)
install + solar panel +
roof  to install (V)
a solar panel (SSC,
direct object) on a roof
(locative) (CSC)
modulo al silicio (SSC)
 modulo al silicio
monocristallino (CSC)
circuito aperto (SSC) 
tensione di circuito
aperto (CSC)
radiazione solare +
incidere + superficie 
radiazione solare
(SSC, subject) incidente
(V) su una superficie
(locative) (CSC)
The contextual behaviour of specialised collocations 61
As highlighted by the examples in the table, argument-related complex colloca-
tions can be constructed not only around verbs but also other word classes that
hold arguments (e.g. nouns such as performance and modulo).
The phraseological and terminological character of a collocation can be
understood as a continuum that fades as the collocation expands and includes
new constituents. This is also evidenced by the results of the extraction of terms
and collocates, with candidates becoming less and less frequent and less and less
specific according to association measures as the collocations expand. Therefore,
a limit to the scope of the collocations under analysis will not be set in advance,
since this limit will be assessed in each case based on corpus data. This aspect
also plays a role in the presentation of collocations and contexts in LSP lexico-
graphic resources (see Section 4).
3 
Constituents of specialised collocations
and their contextual behaviour
The present study is corpus-based, since the context types of specialised colloca-
tions are verified in a corpus according to the previously defined typology. Two
small comparable corpora, Photovoltaics2021_en and Fotovoltaico2021_it, have
been created for English and Italian, each comprising around 800,000 words and
made up of handbooks and guidelines concerning the field of photovoltaic tech-
nology. The texts, which have been manually collected among online resources,
are addressed to technicians and prospective technicians and cover the topics
of design and installation of photovoltaic systems. From a terminological point
of view, corpus texts are characterized by a high degree of specialisation in both
languages.
The language of the photovoltaics domain, as part of the language related to
renewable energies (e.g. hydroelectricity and wind power) is quite rich and com-
bines terminology from base disciplines (e.g. physics, photochemistry, electro-
chemistry), as well as from sister disciplines (e.g. the construction industry). As
a result, the collocations themselves are quite varied in their denotative content
and terminological composition. The English corpus shows a considerable
amount of nominalization, verb forms are often passivised, and synthetic struc-
tures like N-V or N-A compounds (e.g. roof-mounted, air-permeable) are frequent,
as is fairly typical in technical texts.
62 Laura Giacomini
The two corpora have been compiled in Sketch Engine (Kilgarriff et al. 2014)
using the previously selected texts. Sketch Engine was also been used for the
extraction of specialised collocations and the analysis of their contexts. Data are
collected and examined first for English, and the overall result is then compared
to the one obtained by analysing the Italian data. The aim is not to identify exact
correspondences between the two languages at the level of distribution of dif-
ferent specialised collocation types and corresponding contexts, but to test the
applicability of the typology and method to different languages.
The first step of the applied procedure involves the extraction of simple and
complex terms using the Keywords tool. Candidate term lists are obtained based
on the comparison between the two specialised corpora and the general lan-
guage reference corpora English Web 2020 (enTenTen20) for English and Italian
Web 2016 (itTenTen16) for Italian. They are then assigned a keyness score by the
Simple maths method.
3.1 
Specialised collocations and their contexts in the English
corpus
From the list of candidate terms extracted from the English corpus, the most rel-
evant single-word and multi-word terms2 are validated and ten of them selected
considering both the relative frequencies in the focus corpus and the reference
corpus, as well as their keyness score. Single-word terms have been chosen in
such a way that they are not constituents of the multi-word terms at the same
time. The selected terms are shown hereunder in alphabetical order:
(1) amp-hour
(3) deep cycle
(5) electrical installation
(7) grid-connected
(9) junction box
(11) operate
(13) PV
(15) shading
(17) string fuse
(19) voltage drop
(2) charge controller
(4) efficiency
(6) fault current
(8) inverter
(10) maximum power
(12) photovoltaic system
(14) roof
(16) solar cell
(18) surge
(20) wattage
Starting from this list, the contexts of the extracted terms and their collocations
are examined. For this purpose, the Word Sketch and the Multiword Sketch tools
2 According to the terminology used in Sketch Engine.
The contextual behaviour of specialised collocations 63
are used in combination with concordance analysis. The different types of con-
texts emerging for the collocations of the selected terms will now be presented
and discussed by illustrating selected examples.
3.1.1 Embedded constituents in a phrase context
A new constituent is embedded within the collocation. The context is typically
the restricted context of the noun phrase (less frequently: verb, adjective, prepo-
sitional phrase) in which the expansion occurs by means of single adjectival or
adverbial modifiers and nouns. The result is a CSC built recursively from a SSC or
a previous CSC. Embedded constituents (in parentheses) are found in the follow-
ing CSCs:
(1) amp-hour: (total) amp-hour demand;
(2) charge controller: ((MPPT) solar) charge controller;
(3) deep cycle: deep cycle (battery);
(4) efficiency: (power) conversion efficiency;
(5) electrical installation: (requirements for the) electrical installation;
(6) fault current: (d.c.) fault current; fault current (protection);
(7) grid-connected: grid-connected ((PV) system);
(10) maximum power: maximum power (output);
(12) photovoltaic system: (roof-mounted) photovoltaic system; (connect a) photovoltaic
system; photovoltaic (hybrid) system;
(13) PV: (solar) PV system, (off-grid) PV system, ((grid-connected and) stand-alone) PV
system, (installation of a) PV system; PV (module) installation; (thin film) PV cell, PV
cell (material), (crystalline) PV cell; (above roof) PV array;
(14 roof: (asymmetric) duopitched roof;
(17) string fuse: (removable) string fuse;
(18) surge: surge suppression (device);
(20) wattage: (combined) rated wattage
These phrasal contexts appear to be relatively stable, even though these
collocations are sometimes expanded by the addition of free, non-collocational
constituents according to the principle of syntagmatic modifiability described
in Section 2. Here are a few examples of this phenomenon, in which additional,
non-collocational items have been highlighted by means of square brackets:
– maximum allowable residual load [available] for the solar array,
– [many] stand-alone inverters,
– [total] energy demand per day,
– [same] nominal voltage,
– load of the [existing] roof covering,
64 Laura Giacomini
– [advantages of both] lead-calcium and lead-antimony design,
– [quality of these] renewable technology systems,
– silicone or [other] mastic sealant.
It should be noticed, however, that constituents of word combinations character-
ised by a strong conceptual cohesion are not separated by the addition of free,
non-collocational items. Such units of meaning are, for instance,
– junction box,
– fault current,
– voltage drop,
– short circuit,
– altitude correction factor,
but also the ones mentioned in the previous examples.
3.1.2 Argument-related constituents in a sentence (or VP) context
This type of phenomenon corresponds to argument-related CSCs, with colloca-
tions matching one or more arguments of a verb predicate. The context is there-
fore typically the verb phrase or the sentence. This phenomenon can also apply
to nominalisations that maintain the argument structure of the original verb (cf.
Section 3.2). Argument-related constituents are found in the following examples:
(6) fault current: prevent (d.c.) fault current;
(8) inverter: disconnect the inverter from the grid;
(9) junction box: wire the junction box;
(10) operate: interrupter operates correctly;
(12) photovoltaic system: connect a photovoltaic system;
(13) PV: size a ((grid-connected and) stand-alone) PV system, install a PV system, (electri-
cally) connect a PV array;
(14) roof: mount on a (pitched) roof;
(15) shading: calculate the shading factor;
(16) solar cell: (photovoltaic) solar cell produces X volts;
(18) surge: protect from power surges;
(19) voltage drop: minimise voltage drop
As previously pointed out, constituents of argument-related CSCs can sometimes
be recursively expanded depending on their collocational range (cf. prevent fault
current  prevent d.c. fault current). In the abovementioned examples, embedded
constituents are again indicated in parentheses.
The contextual behaviour of specialised collocations 65
The described collocational and contextual behaviour is independent of the
nature of the verbs involved. They are usually relatively general technical verbs
that are very common in technical sublanguages and apply to a large number of
entities, such as operate, minimise, prevent, disconnect, connect, install, produce,
mount, charge, or cable. They primarily indicate dynamic situations, namely pro-
cesses and events (Lyons 1977: 483). A few others, such as earth, ground, overload,
insulate, or retrofit seem to be more specific but not exclusive of the terminology
of photovoltaic systems. A further class of verbs which are present throughout
the corpus are generic, non-technical verbs, such as make, bring, use, or provide.
We can observe that the most frequent collocational structure is formed by
the verb and a direct object. This reflects a typically neutral style of specialised
technical communication, obtained by means of passivisation or other imper-
sonal forms. A neutral style, however, can also be produced by subject-verb col-
locations such as the following:
– the interrupter operates correctly,
– fuses are not likely to operate under short-circuit conditions,
– when the earth fault interrupter operates, an alarm shall be initiated.
Whenever more than one argument of a verb is collocational in nature, concat-
enated, adjacent collocations are built, like in disconnect the inverter from the
grid or photovoltaic solar cell produces X volts. Further examples of this kind are:
– ventilation prevents excessive heat build-up,
– d.c. isolator may be incorporated into the inverter,
– conductors should be suitably protected from mechanical damage,
– blocking diodes should be used in addition to string fuses,
– the amount of sunlight falling onto the face of the PV cell affects its output,
– the amount of energy produced by the array per day.
In some cases, coordinate verbs are able to build parallel CSCs:
– PV systems mounted above or integrated into a pitched roof.
Generally speaking, the displayed contexts appear to be looser than the ones
found in Section 3.1.1. The connection between a verb and its arguments is some-
times interrupted by non-collocational items appearing within the context of the
sentence. Some examples will now be mentioned:
– the amount of sunlight [hitting the array] [also] varies with…,
– the PV array is [typically] mounted on fixed racks,
66 Laura Giacomini
– all d.c. constituents must be rated, [as a minimum], at Voltage: Voc(stc) x 1.15,
– manual load switching is [sometimes] provided,
– direct or diffuse light [(usually sunlight)] [shining on the solar cells] induces the photo-
voltaic effect.
Additional, non-collocational items have been highlighted by means of square
brackets. They are, for instance, appositives, adverbials, or participle construc-
tions with the function of a relative clause.
3.1.3 Remote constituents beyond the sentence level
Alongside embedded and argument-related constituents, which, as we have seen,
correspond to different types of phrases and are more or less fixed in nature, we
have postulated that there are also broader contexts, above the sentence level,
in which the constituents of specialised collocations can be distributed. Indeed,
we assume that specialised discourse, as it develops in a text, becomes homoge-
neous through textual cohesion and coherence (cf. De Beaugrande and Dressler
1981; Adamzik 2014). We attempted to test whether there exist in a text possibili-
ties and modalities of distribution of lexical constituents of phrasemes beyond
the sentence level, without, for the moment, investigating the causes of such dis-
tribution.3 In doing this, we expand the analysis of collocations from the initial
set of 20 combinations listed at the beginning of this section to further validated
combinations, in order to obtain a broader picture of the phenomenon.
By observing the behaviour of simple and complex specialised collocations
in the English corpus, we notice specific patterns of use according to which at
least one constituent is explicitly or implicitly echoed in different sentences,
sometimes interspersed with further sentences. This is an anaphorical repetition
associated with the collocative nature of some terms. The following phenomena
have been identified:
(a) A constituent of a simple or complex collocation is explicitly repeated as such
in subsequent sentences, in which it is associated with further collocations,
as in the first example below, while the anaphorical character of the second
item is signalled by the use of the determiner the in the second example:
A charge controller is connected in between the solar panels and the batteries. The charge
controller operates automatically and ensures that the maximum output of the solar panels
is directed to charge the batteries without overcharging or damaging them.
3 ‘Distribution’ is generically used to indicate the position and arrangement of collocation con-
stituents throughout corpus texts, not in the sense of distributional semantics.
The contextual behaviour of specialised collocations 67
The inclination (or pitch) of the array is to be measured or determined from plan. The
required value is the degrees from horizontal. Hence, an inclination of 0° represents a hori-
zontal array; 90°represents a vertical array.
This happens very often in list-based text structures:
The approach is as follows:
1. Establish the electrical rating of the PV array in kilowatts peak (kWp)
2. Determine the postcode region
3. Determine the array pitch
4. Determine the array orientation
5. Look up kWh/kWp (Kk) from the appropriate location specific table
6. Determine the shading factor of the array (SF) according to any objects blocking the
horizon – using shade factor procedure set out in 3.7.7
(b) A usually personal or demonstrative pronoun or adjective refers back towards
the constituent of a simple or complex collocation in a preceding sentence,
and introduces a new collocation of that constituent.
PV specific plug and socket connectors are commonly fitted to module cables by the manu-
facturer. Such connectors provide a secure, durable and effective electrical contact. They
also simplify and increase the safety of installation works.
Battery Backup Inverters: These are special inverters which are designed to draw energy
from a battery, manage the battery charge via an onboard charger, and export excess energy
to the utility grid. These inverters are capable of supplying AC energy to selected loads
during a utility outage and are required to have anti-islanding protection.
…each layer extracts energy from each photon from a particular portion of the light spectrum
thatisbombardingthecell.ThislayeringofthePVmaterialsincreasestheoverallefficiency…
This second modality seems to be less frequent in the specialised texts that make
up our corpus than type (a). For the sake of communicative clarity, the repeti-
tion of terms seems to be preferable to that of pronouns with anaphoric function,
especially in distinct sentences, in which the distance between the pronoun and
the antecedent reference could easily lead to semantic ambiguities.
(c) 
A constituent of a simple or complex collocation is explicitly reiterated after
one or more sentences.
The intention does not seem properly anaphoric, yet the same collocational constitu-
entisfoundindifferentsentenceswithoutstrongconnectionatthelevelofdiscourse.
Where the array frame is mounted on a domestic roof or similar, the likelihood of the frame
being an extraneous-conductive-part is very low – due to the type and amount of material
used between the ground and the roof structure (which will mainly be non-conductive). Even
in the case of an array frame being mounted on a commercial building where mostly steel-
work is used, it is likely that the frame will be either isolated, …
68 Laura Giacomini
(d) 
In some cases, one constituent is implicitly reiterated in a later context,
in which another constituent of the collocation appears. In the example
mentioned below, battery capacity is a specialised collocation that can
be identified by observing the structure of the context. We will also call
this phenomenon a subtype of anaphora, since the reprise of battery in e.
Capacity is just implicit.
Battery Inputs and Specifications
a. Days of storage desired/required = 7 days
b. Depth-of-discharge limit (typical value) = 0.8
c. Make/ Model = Exide 6E95-11 (Deep cycle battery)
d. Battery cell voltage = 12 V
e. Capacity = 478 Amp-hour (Ah)
f. System voltage (battery bus voltage) = 24 V
g. Battery round trip efficiency = 0.85 for efficiency batteries.
These different forms of phraseological behaviour ‘distributed’ over several sen-
tences seem to correspond to the typical structures of the textual genre in ques-
tion, as well as to the typical contents and modes of technical writing, including
descriptions of methods, processes and their individual steps, the repeated use
of lexical elements in contiguous or distant sentences, and the schematic style
of lists.
3.1.4 
General remarks on the analysis of the context of specialised
collocations
As indicated at the beginning of the section, specialised collocations have been
extracted using the Word Sketch and the Multiword Sketch tools of the Sketch
Engine, progressively widening the scope of the analysed text section, but always
remaining within the sentence boundaries. This allows, in particular,
– to study the behaviour of recursively built and argument-related CSCs,
– to highlight their combinability, and
– to observe that recursively built CSCs typically correspond to multi-word
terms in the strict sense.
Contextual analysis of the constituents of specialised collocations within a sen-
tence reveals unsurprising regularities. As soon as larger portions of the text are
considered, however, the picture becomes considerably complicated.
Due to their textually irregular distribution, remote constituents beyond the
sentence level are clearly more difficult to be detected in the corpus than embed-
ded and argument-related constituents. A first phase of the analysis has been
The contextual behaviour of specialised collocations 69
carried out manually on a part of the texts in order to identify regularities in
the contextual behaviour of the specialised collocations. In a second phase, the
observations made have been applied to a semi-automatic procedure, with vali-
dated simple collocates searched within textual structures larger than the sen-
tence, i.e. paragraphs and documents, by means of the Corpus Query Language,
such as in the following example:
“junction” “box” []* “junction” “box” !within  s/
The queries are particularly challenging when dealing with anaphorical pro-
forms, which are not easy to predict. In comparison with the results obtained by
means of Multiword Sketches, no better results have been achieved by analysing
collocation graphs through the #LancsBox tool (Brezina et al. 2020; cf. also Baker
2016; Brezina 2018a).
We have not focused on quantitative data for the time being, as we believe
that the frequency of use of the latter phenomenon is closely linked to the termi-
nology of the specialised field and the conventions of the textual genre, rather
than to strictly phraseological factors. We have therefore limited ourselves to
observing the types of contexts by describing them from a qualitative point of
view. A quantitative analysis, on the other hand, may be of interest for comparing
the two corpora and drawing preliminary conclusions on the contextual behav-
iour of specialised collocations in English and Italian (see Section 3.2).
In Giacomini et al. (2020), the notion of ‘conceptual range’, which refers to
the syntactic level at which the concept of a complex collocation is encoded, was
introduced. Complex collocations built by recursive expansion retain the same
properties as the simple collocations from which they originate. A noun phrase,
for instance, is expanded into a larger noun phrase by the addition of an adjecti-
val modifier, or a verb phrase is expanded into a larger verb phrase by the addi-
tion of an adverbial modifier. The concept encoded by the complex collocation
is specified at the phrase level. Concepts covered by argument-related complex
collocations, on the contrary, are encoded at sentence level (or at least at verb
phrase level). This level is also able to identify complex ‘scenes’ when all syntac-
tic arguments of a verb are involved in a sequence of collocations.
As shown by the previous analysis of specialised collocations, the idea of
conceptual range is also applicable to terminology and is useful for inferring a
notion of collocation context from the data. The context of specialised colloca-
tions is thus understood as a frame that surrounds a focal event (Goodwin and
Duranti 1992) and, specifically, as the portion of text in which components of spe-
cialised collocations appear while still being perceived as a phraseological unit,
which varies both with the manner of expansion from simple to complex colloca-
tions and with the explicit or implicit anaphoric resumption of constituents from
one sentence to subsequent sentences.
70 Laura Giacomini
3.2 Comparative application to the Italian corpus
The Italian corpus of texts on photovoltaics is comparable in size and composi-
tion with the English corpus. It has been surveyed for the same phenomena as
described in Section 3.1. Analysis has been carried out on the following set of
terms, listed in alphabetical order:
(1) cella fotovoltaica
(3) diodo di bypass
(5) fonti rinnovabili
(7) FV
(9) impianto fotovoltaico
(11) irraggiamento voltaggio
(13) massima potenza
(15) nominale
(17) radiazione solare
(19) silicio
(2) corrente continua
(4) energia elettrica
(6) fotoelettrico nominale
(8) generatore fotovoltaico
(10) installare
(12) kW
(14) modulo fotovoltaico
(16) ombreggiamento
(18) retrofit
(20) voltaggio
Table 2 presents some examples for each category of context.
Tab. 2: Examples of contextual phenomena regarding specialised collocations in the Italian corpus
Embedded constituents
in a phrase context
potenza nominale variabile
cella fotovoltaica al silicio
fonti rinnovabili tradizionali
punto di massima potenza
energia elettrica e termica
potenza di … kW
sistema FV autonomo
voltaggio di funzionamento
radiazione solare al suolo
effetto fotoelettrico della luce solare
Argument-related
constituents
in a sentence (or VP)
context
massimizzare l’irraggiamento solare
convergere la radiazione solare su una cella fotovoltaica
montare un diodo di bypass
generare energia elettrica
misurare la corrente continua all’uscita dal generatore fotovoltaico
integrare un modulo fotovoltaico nella copertura
progettare e installare un impianto fotovoltaico
irraggiamento su superficie inclinata
fenomeni di ombreggiamento del campo fotovoltaico
applicazione retrofit in facciata
produzione di energia elettrica da fonti rinnovabili
The contextual behaviour of specialised collocations 71
Remote constituents
beyond the sentence
level
Lo schema sintetizza le possibili configurazioni che caratterizzano un
impianto FV. In esso sono presenti cinque insiemi, composti ciascuno
da diversi elementi, che in varie configurazioni caratterizzano le
tipologie di impianto.
Per quanto riguarda la tecnologia, la quota di produzione di celle al
silicio è in crescita e resta la predominante con il 94,2% del totale
prodotto. Il silicio multicristallino con il 56,9% del mercato risulta
essere il più utilizzato rispetto al monocristallino, all’amorfo e al film
sottile. Tuttavia, nuova spinta sta avendo il silicio mono-cristallino […].
Prima di eseguire le misure si consigliano i seguenti controlli:
– verificare che ci siano condizioni di irraggiamento stabili e che non
ci siano nuvole bianche in un cono di 60° di apertura intorno al sole
che possano rendere instabili le misure di radiazione solare; […]
– evitare di fare verifiche tecniche-funzionali nelle giornate afose, al
crescere del contenuto di umidità nell’aria aumenta la constituente
di radiazione diffusa e di conseguenza il rendimento del campo
fotovoltaico è più basso; un semplice espediente per capire se si
è in presenza di umidità eccessiva nell’aria è quello di osservare
la colorazione del cielo: se questo è di un bel blu la radiazione
diffusa è molto bassa, più il colore del cielo tende al bianco più la
constituente diffusa è elevata. […]
– verificare che ci sia una radiazione superiore a 600 W/m2; […]
The three context types of collocational constituents are widely present in both
languages. The length of the chains of embedded CSCs is as variable as that of
argument-related CSCs, which often form sequences of adjacent collocations for
verbs with multiple arguments such as integrare, convergere or montare. Even at
the level of remote constituents located in distinct sentences, we do not notice
any obvious difference. However, a difference seems to emerge precisely in the
case of verbal argument structures: much more frequently than in the English
corpus, these structures are transferred to verb nominalisations. This is, for
example, the case of irraggiamento su superficie inclinata (corresponding to the
verbal expression: irraggiare su superficie inclinata) or produzione di energia
elettrica da fonti rinnovabili (corresponding to the verbal expression: produrre
energia elettrica da fonti rinnovabili) (cf. Daille 2017 for an overview of syntactic
variation of this kind).
Finally, a brief quantitative analysis has been conducted to compare the col-
locational data in the two languages. The analysis has been applied in the two
languages to the 20 simple and complex terms of reference already illustrated.
Tab. 2: (Continued)
72 Laura Giacomini
For the embedded and argument-related constituents, the calculated value
was the maximum number of constituents found for the collocations of a certain
term, according to the following scheme:
term: PV
 two-word collocation: PV system 
 three-word collocation: stand-alone PV system [embedded]; size a PV system [argument-
related] 
 four-word collocation: grid-connected and stand-alone PV system [embedded]; algorithm
sizes a PV system [argument-related, adjacent] 
 five-word collocation: algorithm sizes a stand-alone PV system [embedded + argument-
related, adjacent] …
It is not useful to distinguish the two types of contexts, since, as shown in
the last word combination of the above example, embedded and argument-
related (sometimes adjacent) constituents are frequently mixed. A maximum
of five constituents has been tested: beyond this limit, no collocational combi-
nations have been found for the selected terms. In addition, as the number of
constituents increases, so does the difficulty of extracting candidate combina-
tions automatically, as their uniqueness in the corpus increases and they are
no longer detected by the system as collocation candidates. Table 3 displays
our results.
Tab. 3: Comparison between the English and the Italian corpus for what concerns the
distribution of embedded and argument-related constituents of specialised collocations. The
context of refence is the phrase as well as the sentence level
Number of constituents in a specialised collocation:
2 3 4 5
EN 2/20 (10%) 10/20 (50%) 6/20 (30%) 2/20 (10%)
IT 2/20 (10%) 13/20 (65%) 4/20 (20%) 1/20 (5%)
For constituents of the remote type, i.e. traceable in different sentences, the
choice has been made to calculate the distance, in terms of sentences, between
anaphoric pairs within the discourse, focusing on the repetition of a collocative
constituent as such or through a pro-form. These two cases have not been distin-
guished from each other. Since the same anaphoric pair can be found at differ-
ent distances at different points in the corpus, it can be accounted for more than
once. Table 4 shows the results of this second quantitative assessment.
The contextual behaviour of specialised collocations 73
Tab. 4: Comparison between the English and the Italian corpus for what concerns the distribution
of remote constituents of specialised collocations4
Distance in terms of number of sentences:
0 1 2 ≥2
EN 8/20 (40%) 15/20 (75%) 10/20 (50%) 9/20 (45%)
IT 7/20 (35%) 18/20 (90%) 10/20 (50%) 8/20 (40%)
The amount of data observed is too small to draw relevant conclusions, but it helps
to hypothesise trends that could be tested in the future. From this point of view, it is
useful to look at the percentage data in the two tables for the two languages, which
show very similar results. In both languages, though with a slight predominance
for Italian, the number of constituents of a specialised collocation found most often
within the sentence is three, followed by four. Above the sentence level, most col-
locational constituents tend to be repeated in the next sentence or after two sen-
tences. Less than half of the selected terms are not subject to any type of anaphoric
repetition; nearly half occur in more distant sentences. The observations made so
far on the contextual data of specialised collocations will now be used to make con-
siderations on the treatment of collocational contexts in specialised dictionaries.
4 
Presenting collocational context in LSP
dictionaries
In this section we will focus on possible ways of presenting the different colloca-
tional contexts in LSP dictionaries, providing guidelines for implementing obser-
vations made on corpus data. As pointed out by Gouws (2015: 184), “the inclusion
of complex collocations remains important and lexicographers should negotiate
the best possible way of presenting them and of making users aware of their exist-
ence”. As a consequence, this need also involves the presentation of collocations
in different contexts. In existing LSP resources the focus of presentation gener-
ally falls on predominantly binary specialised collocations, for which usually no
context is given or, at most, some usage examples are provided.
4 The context of refence is beyond the sentence level. The distance in terms of sentences has to
be understood as follows: 0 = a constituent is not found in a different sentence, 1 = a constituent
is found in the next sentence, 2 = a constituent is found two sentences later, and so on.
74 Laura Giacomini
The variety of contexts brought to light by our analysis makes us reflect on
the need to give these phenomena greater weight in lexicography. Providing the
dictionary user with detailed data on the contexts of use of specialised colloca-
tions supports with high probability the textual production function of the dic-
tionary. These data can be located within the microstructure of the dictionary in
a dedicated section or be systematically substituted for generic usage examples.
Based on the example of PV system, a very frequent collocation from the
English corpus, an entry draft will now be presented in which the lexicographic
items related to contextual knowledge will be highlighted (Table 5). PV system
serves in this entry as a lemma, although the term could alternatively be pre-
sented as a collocation of the lemma PV together with other collocations such as
PV cell, PV module and PV array.
Tab. 5: Entry draft for the term PV system containing lexicographic items related to the
contextual properties of the term
PV system
n. (↑PV, photovoltaic)
DEFINITION: A photovoltaic (PV) system is a technology that converts solar radiation into
electric current. […]
COLLOCATIONS IN CONTEXT:
– PHRASE LEVEL
NOUN PHRASE w/ PRE-MODIFIER:
grid-connected PV system
stand-alone PV system
solar PV system
Without batteries, a grid-connected PV system will shut down when a utility power outage
occurs. [Bhatia: Course]
‣[further examples]
PREPOSITIONAL PHRASE / COMPOUND:
PV system of … kWp
≈ … kWp PV system
MPPT (Maximum Power Point Tracking) for a PV system
design of a PV system
≈ PV system design → to design
installation of a PV system
≈ PV system installation → to install
The contextual behaviour of specialised collocations 75
Tab. 5: (Continued)
components of a PV system
≈ PV system components
PV system efficiency
PV system performance
Batteries consume energy during charging and discharging, reducing the efficiency and
output of the PV system by about 10 percent for lead-acid batteries. [Bhatia: Course]
‣[further examples]
VERB PHRASE:
to install a PV system (on a roof) → installation
to design a PV system → design
(an algorithm) sizes a PV system
a PV system generates power
a PV system delivers power
When designing the PV system, potential problems such as sulphation, stratification and
freezing should be considered and avoided. [Bhatia: Course]
‣[further examples]
– DISCOURSE LEVEL
It is generally accepted that the installation of a typical roof-mounted PV system presents
a very small increased risk of a direct lightning strike. However, this may not necessarily
be the case where the PV system is particularly large, where the PV system is installed on
the top of a tall building, where the PV system becomes the tallest structure in the vicinity,
or where the PV system is installed in an open area such as a field. [eca: Installation guide.]
Solar PV systems require minimal maintenance, as they do not usually have moving
parts. However, routine maintenance is required to ensure the solar PV system will
continue to perform properly. [eca: Handbook.]
Before starting any PV system testing: (hard hat and eye protection recommended)
1. 
Check that non-current carrying metal parts are grounded properly. (array frames,
racks, metal boxes, etc. are connected to the grounding system)
2. Ensure that all labels and safety signs specified in the plans are in place.
3. 
Verify that all disconnect switches (from the main AC disconnect all the way through to
the combiner fuse switches) are in the open position and tag each box with a warning
sign to signify that work on the PV system is in progress. [CEC: Installation guide.]
76 Laura Giacomini
This study has shown that specialised collocations form a continuum of con-
stituents that fit into contexts of varying length (cf. also Wahl and Gries 2018 for
a study of multi-word expressions of increasing length). If the collocational range
(McIntosh 1966) of a specialised term is exhausted after a certain number of con-
stituents, as can be inferred from the results presented in Section 3.2, it is neces-
sary to establish, early in the lexicographic process, what the spatial limit in the
representation of the phraseological continuum in question can or should be.
From the perspective of textual production both in the mother tongue and
in the foreign language, as well as of ‘active’ translation, the availability in the
dictionary entry of typical contexts, more or less extended depending on the loca-
tion, can be crucial. It is reasonable to assume, therefore, that flexibility in the
coverage of such contexts is beneficial. Moreover, the typicality of the contexts
can be measured in terms of frequency and strength of association in the corpus,
obviously adapting the statistical validation thresholds of the candidate colloca-
tions as one gradually moves on to more extensive and thus per se (much) less
frequent combinations.
The proposed microstructure contains a specific search zone dedicated to the
different contexts of use of specialised collocations. Thinking of the ideal user
of the LSP dictionary as a translator or technical writer with good metalinguistic
skills, we have chosen to mark these contexts with syntactic labels, as shown in
the abstract microstructure of the entry:
PHRASE LEVEL:
NOUN PHRASE w/ PRE-MODIFIER ≈ embedded constituents
Collocations
Example(s)/ Source/ Genre
NOUN PHRASE w/ POST-MODIFIER		 ≈ embedded constituents
Collocations
Example(s)/ Source/ Genre
VERB PHRASE 				 ≈ 
argument-related (among which:
adjacent) constituents
DISCOURSE LEVEL: 			 ≈ remote constituents
Example(s)/ Source/ Genre
We have chosen to indicate the various types of context by means of syntactic
tags, without resorting to the corresponding terminology (e.g. embedded, argu-
ment-related, remote constituents) used in this study, which might not be particu-
larly user-friendly in a lexicographic environment. At the phrase level, collocates
of the lemma are highlighted and accompanied by less frequent collocates
indicated in round brackets (e.g. install... on a roof). Nominalisations of verbs
(e.g. components of a PV system) are referenced to the corresponding verb form
The contextual behaviour of specialised collocations 77
(PV system component) and vice versa. Equivalences between different syntactic
structures are also indicated, e.g. between a noun phrase with post-modifier and
a compound (e.g. design of a PV system and PV system design).
At the discourse level, the emphasis is on the ways in which the collocative
context is typically constructed in certain textual genres. Here, it is important to
highlight paradigmatic cases of explicit (or implicit) anaphora by means of term
repetition or pro-form (both underlined) with indication of the textual genre and
source (in parentheses). These context examples are very broad but not generic,
as they also focus on the collocative behaviour of terms. Each zone of the entry
should be integrated with further (linked) corpus examples. All examples in the
entry are followed by the indication of their source as well as of the textual genre.
The presented microstructural model can be varied in many ways, also
depending on the mode of publication. Nevertheless, it introduces elements that
are essential for the description of the possible context of the specialised colloca-
tions of a certain domain, such as
– the subdivision of specialised collocations not on the basis of each individual
syntactic structure, but of classes of contexts valid for both SSCs and CSCs;
– the possibility of expanding collocations on the basis of the concrete behav-
iour of the terms in the corpus, without imposing a predefined scope.
5 Conclusions
This paper has focused on the role of specialised phraseology, in particular col-
locations, in determining a significant part of the contexts in which domain ter-
minology is used. It contributes to corpus-based research on terminology and
phraseology by providing new insights into the formation and behaviour of n-ary
collocations in technical texts. Different contexts of simple and complex special-
ised collocations have been described. It is precisely the complex collocations
that turn out to be extremely interesting from this point of view, since they are
formed in accordance with different contexts. The possible contexts range from
the area of the phrase to that of the sentence (around a predicate) until they cross
the border of the sentence to develop in the text discourse.
A notion of collocation context has been directly inferred from the analysis
of corpus data: it is the portion of the text in which specialised collocation occurs
while still being perceived as a phraseological unit. The context varies both with
the manner of collocation expansion from a simple to a complex collocation, and
with the anaphoric resumption of collocation constituents from one sentence to
subsequent sentences.
78 Laura Giacomini
Apart from single phrases, in which collocations occur and expand in virtue
of syntactic-semantic restrictions and typicality, various factors seem to intervene
in the distribution of collocations above the sentence level, for example textual,
communicative and pragmatic factors, such as the structural coding conventions
of a textual genre already mentioned in Section 3, or possibly further functional
or discursive causes (cf. terminology employed by Freixa (2013) for describing
causes of term variation).
The limitations of the analysis carried out lie in the restricted possibilities
of detecting and thus automatically extracting complex collocations as well as
identifying remote components of collocations beyond the level of the individ-
ual sentence. In future work, new possibilities for extracting terms related to
discourse analysis (Widdowson 2008; Brezina 2018b; Loureda et al. 2019) could
be explored, including the contextual role of genuinely pragmatic aspects.
From a genuinely computational point of view, the application of existing meth-
odologies for collocation identification such as finite state transducers associ-
ated with metagraphs (Tutin 2017) as well as the analysis of word and sentence
embeddings (cf., among others, Goldberg 2017, Reimers and Gurevych 2019),
might complement the current method by building the ground for quantitative
analysis.
Further experiments in data collection and processing should be carried
out in new specialist areas to assess the general applicability of the model.
Likewise, the ordering strategies for lexicographic data should be further inves-
tigated, varying the structure presented in this contribution according to the
specific dictionary function and ideal user group, but also taking into account
the possibility of covering context information concerning bilingual or multi-
lingual data.
References
Adamzik, Kirsten. 2014. Textlinguistik: eine einführende Darstellung. Berlin: De Gruyter.
Baker, Paul. 2016. The shapes of collocation. International Journal of Corpus Linguistics, 21(2).
139–164.
Bartsch, Sabine. 2004. Structural and Functional Properties of Collocations in English.
Tübingen: Narr
Brezina, Vaclav. 2018a. Collocation graphs and networks: Selected applications. In Pascual
Cantos-Gómez  Moisés Almela-Sánchez (eds.), Lexical collocation analysis (Quantitative
Methods in the Humanities and Social Sciences), 59–83. Cham: Springer.
Brezina, Vaclav. 2018b. Statistical choices in corpus-based discourse analysis. In Charlotte
Taylor  Anna Marchi (eds.), Corpus Approaches to Discourse, 259–280. London  New
York: Routledge.
The contextual behaviour of specialised collocations 79
Brezina, Vaclav, Pierre Weill-Tessier  Anthony McEnery. 2020. #LancsBox v.5.x. [software].
http://corpora.lancs.ac.uk/lancsbox/
Burger, Harald. 2015. Phraseologie: Eine Einführung am Beispiel des Deutschen (5., neu
bearbeitete Auflage). Berlin: Schmidt.
Caro Cedillo, Ana. 2004. Fachsprachliche Kollokationen: Ein übersetzungsorientiertes
Datenbankmodell Deutsch-Spanisch. Tübingen: Narr.
Corpas Pastor, Gloria  Jean-Pierre Colson. 2020. Introduction. In Gloria Corpas Pastor 
Jean-Pierre Colson (eds.), Computational Phraseology (IVITRA Research in Linguistics and
Literature, 24), 1–8. Amsterdam  Philadelphia: Benjamins.
De Beaugrande, Robert-Alain  Wolfgang U. Dressler. 1981. Einführung in die Textlinguistik.
Tübingen: Niemeyer.
Daille, Beatrice. 2017. Term Variation in Specialised Corpora: Characterisation, automatic
discovery and applications. Amsterdam  Philadelphia: Benjamins.
Evert, Stefan. 2009. Corpora and collocations. In Anke Lüdeling  Merja Kytö (eds.), Corpus
Linguistics. An International Handbook (Volume 2), 1212–1248. Berlin  New York: De
Gruyter Mouton.
Freixa, Judit. 2013. Otra vez sobre las causas de la variación denominativa. Debate
Terminológico 09. 38–46.
Giacomini, Laura. 2021. Phraseology in technical texts: A frame-based approach to multiword
term analysis and extraction. In Carmen Mellado Blanco (ed.), Productive Patterns in
Phraseology and Construction Grammar: A Multilingual Approach, 215–234. Berlin 
Boston: De Gruyter.
Giacomini, Laura, Paolo DiMuccio-Failla  Eva Lanzi. 2021. The interaction of argument
structures and complex collocations: role and challenges in learner’s lexicography. In
Proceedings of the EURALEX XIX International Conference, Alexandroupoli, 7–9 September,
285–293.
Gläser, Rosemarie. 2007. Fachphraseologie. In Harald Burger, Gerold Ungeheuer  Herbert
Ernst Wiegand (eds.), Handbücher zur Sprach- und Kommunikationswissenschaft /
Handbooks of Linguistics and Communication Science (HSK, Vol. 1), 482–505. Berlin: De
Gruyter.
Goldberg, Yoav. 2017. Neural Network Methods in Natural Language Processing. Synthesis
Lectures on Human Language Technologies (April 2017). San Rafael, CA: Morgan 
Claypool Publishers.
Goodwin, Charles  Alessandro Duranti. 1992. Rethinking context: an introduction. In Alessandro
Duranti  Charles Goodwin (eds.), Rethinking context: Language as an interactive
phenomenon, 1-42. Cambridge: Cambridge University Press.
Gouadec, Daniel. 1994. Nature et traitement des entités phraséologiques. In Terminologie
et phraséologie: acteurs et amenageurs; actes de la deuxième Université d’Automne en
Terminologie, Rennes 2, Septembre 1993, 167–193.
Gouws, Rufus H. 2015. The presentation and treatment of collocations as secondary guiding
elements in dictionaries. Lexikos 25. 170–190.
Heid, Ulrich. 1994. On ways words work together – research topics in lexical combinatorics.
In Proceedings of the VI EURALEX International Congress, Amsterdam, 30 August–3
September, 226–257.
Kilgarriff, Adam, Vit Baisa, Jan Bušta, Miloš Jakubíček, Vojtech Kovář, Jan Michelfeit, Pavel
Rychlý  Vit Suchomel. 2014. The sketch engine: Ten years on. Lexicography, 1(1).
7–36.
80 Laura Giacomini
L’Homme, Marie-Claude  Daphnée Azoulay. 2020. Collecting collocations from general and
specialised corpora: A comparative analysis. In Gloria Corpas Pastor  Jean-Pierre Colson
(eds.), Computational Phraseology (IVITRA Research in Linguistics and Literature, 24),
151–176. Amsterdam  Philadelphia: Benjamins.
Loureda, Óscar, Inés Recio Fernández, Laura Nadal  Adriana Cruz (eds.). 2019. Empirical
studies of the construction of discourse. Amsterdam  Philadelphia: Benjamins.
Lyons, John. 1995. Text and discourse; context and co-text. In John Lyons (ed.), Linguistic
Semantics: An Introduction, 258–292. Cambridge: Cambridge University Press.
Lyons, John. 1977. Semantics. Cambridge: Cambridge University Press.
McIntosh, Angus. 1966. Patterns and ranges. Language 37. 325–337.
Reimers, Nils  Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese
bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on Natural Language
Processing, 3982–3992. Hong Kong, China, November 3–7, 2019.
Seretan, Violeta. 2013. A multilingual integrated framework for processing lexical collocations.
In Adam Przepiórkowski (ed.), Computational Linguistics – Applications, 87–108.
Heidelberg  New York: Springer.
Sinclair, John McH.  Ronald Carter (eds.). 2004. Trust the text: Language, corpus and
discourse. London  New York: Routledge.
Tutin, Agnès. 2017. Annotating lexical functions in corpora: Showing collocations in context.
In Proceedings of the Second International Conference on the Meaning-Text Model,
498–510. Moscow: Slavic Culture Languages Publishing House.
Tutin, Agnès  Olivier Kraif. 2016. From binary collocations to grammatically extended
collocations: Some insights in the semantic field of emotions in French. Mémoires de la
Société néophilologique de Helsinki, Helsinki: Société néophilologique de Helsinki, 2016,
Collocations Cross-Linguistically. Corpora, Dictionaries and Language Teaching, 245–266.
Wahl, Alexander  Stefan Th. Gries. 2018. Multi-word Expressions: A Novel Computational
Approach to Their Bottom-Up Statistical Extraction. In Pascual Cantos-Gómez  Moisés
Almela-Sánchez (eds.), Lexical collocation analysis (Quantitative Methods in the
Humanities and Social Sciences), 85–110. Cham: Springer.
Widdowson, Henry George. 2008. Text, context, pretext: Critical issues in discourse analysis.
New York: John Wiley  Sons.

More Related Content

Similar to article writing mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm 5.pdf

05 linguistic theory meets lexicography
05 linguistic theory meets lexicography05 linguistic theory meets lexicography
05 linguistic theory meets lexicographyDuygu Aşıklar
 
What can a corpus tell us about grammar?
What can a corpus tell us about grammar?What can a corpus tell us about grammar?
What can a corpus tell us about grammar?Pascual Pérez-Paredes
 
Argumentative texts and clause types.pdf
Argumentative texts and clause types.pdfArgumentative texts and clause types.pdf
Argumentative texts and clause types.pdfAngelina Johnson
 
8_2018_04_01!09_53_37_PM.ppt
8_2018_04_01!09_53_37_PM.ppt8_2018_04_01!09_53_37_PM.ppt
8_2018_04_01!09_53_37_PM.pptHongTrngHuy2
 
Cohesion In English Wasee
Cohesion In English  WaseeCohesion In English  Wasee
Cohesion In English WaseeDr. Cupid Lucid
 
Dialnet words aslexicalunitsinlearningteachingvocabulary-2579878
Dialnet words aslexicalunitsinlearningteachingvocabulary-2579878Dialnet words aslexicalunitsinlearningteachingvocabulary-2579878
Dialnet words aslexicalunitsinlearningteachingvocabulary-2579878Ine Purwanti
 
06 planning the dictionary
06 planning the dictionary06 planning the dictionary
06 planning the dictionaryDuygu Aşıklar
 
Ana's dissertation workshop 2
Ana's dissertation workshop 2Ana's dissertation workshop 2
Ana's dissertation workshop 2Ana Zhong
 
Language and its components
Language and its componentsLanguage and its components
Language and its componentsMIMOUN SEHIBI
 
Syntactic Features in Mother Tongue.pptx
Syntactic Features in Mother Tongue.pptxSyntactic Features in Mother Tongue.pptx
Syntactic Features in Mother Tongue.pptxJamelMirafuentes
 
Lexical functional grammar (lfg).pptx
Lexical functional grammar (lfg).pptxLexical functional grammar (lfg).pptx
Lexical functional grammar (lfg).pptxjhonalyntizon
 

Similar to article writing mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm 5.pdf (20)

05 linguistic theory meets lexicography
05 linguistic theory meets lexicography05 linguistic theory meets lexicography
05 linguistic theory meets lexicography
 
Cohesion In English
Cohesion In EnglishCohesion In English
Cohesion In English
 
Cohesion Final
Cohesion FinalCohesion Final
Cohesion Final
 
What can a corpus tell us about grammar?
What can a corpus tell us about grammar?What can a corpus tell us about grammar?
What can a corpus tell us about grammar?
 
Argumentative texts and clause types.pdf
Argumentative texts and clause types.pdfArgumentative texts and clause types.pdf
Argumentative texts and clause types.pdf
 
Cohesion And Coherence
Cohesion And CoherenceCohesion And Coherence
Cohesion And Coherence
 
8_2018_04_01!09_53_37_PM.ppt
8_2018_04_01!09_53_37_PM.ppt8_2018_04_01!09_53_37_PM.ppt
8_2018_04_01!09_53_37_PM.ppt
 
Level ofn language.ppt
Level ofn language.pptLevel ofn language.ppt
Level ofn language.ppt
 
Cohesion In English Wasee
Cohesion In English  WaseeCohesion In English  Wasee
Cohesion In English Wasee
 
Dialnet words aslexicalunitsinlearningteachingvocabulary-2579878
Dialnet words aslexicalunitsinlearningteachingvocabulary-2579878Dialnet words aslexicalunitsinlearningteachingvocabulary-2579878
Dialnet words aslexicalunitsinlearningteachingvocabulary-2579878
 
06 planning the dictionary
06 planning the dictionary06 planning the dictionary
06 planning the dictionary
 
chapter_5_3.pptx
chapter_5_3.pptxchapter_5_3.pptx
chapter_5_3.pptx
 
Cohesion And Coherence1
Cohesion And Coherence1Cohesion And Coherence1
Cohesion And Coherence1
 
Cohesion And Coherence1
Cohesion And Coherence1Cohesion And Coherence1
Cohesion And Coherence1
 
Ana's dissertation workshop 2
Ana's dissertation workshop 2Ana's dissertation workshop 2
Ana's dissertation workshop 2
 
Cohesion types
Cohesion typesCohesion types
Cohesion types
 
Language and its components
Language and its componentsLanguage and its components
Language and its components
 
Syntactic Features in Mother Tongue.pptx
Syntactic Features in Mother Tongue.pptxSyntactic Features in Mother Tongue.pptx
Syntactic Features in Mother Tongue.pptx
 
Lexical functional grammar (lfg).pptx
Lexical functional grammar (lfg).pptxLexical functional grammar (lfg).pptx
Lexical functional grammar (lfg).pptx
 
elies38-6.pdf
elies38-6.pdfelies38-6.pdf
elies38-6.pdf
 

Recently uploaded

best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...kajalverma014
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsMonica Sydney
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsMonica Sydney
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样ayvbos
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsMonica Sydney
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrHenryBriggs2
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoilmeghakumariji156
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsPriya Reddy
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理F
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdfMatthew Sinclair
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsMonica Sydney
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdfMatthew Sinclair
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdfMatthew Sinclair
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiMonica Sydney
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasDigicorns Technologies
 

Recently uploaded (20)

best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girls
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 

article writing mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm 5.pdf

  • 1. DOI 10.1515/phras-2022-0005 YoP 2022; 13: 55–80 Laura Giacomini The contextual behaviour of specialised collocations: typology and lexicographic treatment Abstract: A corpus-based analysis of specialised phraseology can shed light on the role of phrasal context in terminology. This contribution describes the behav- iour of constituents of simple and complex specialised collocations in technical texts and the way in which these are distributed in different contexts ranging from the immediate surroundings of a node to the inclusion of a much larger portion of text. The contextual behaviour of specialised collocations is exemplified by English and Italian terms in the domain of photovoltaic technology. This contri- bution aims to identify and classify specialised collocations along their formation modalities and contexts, as well as to discuss the impact of these phenomena on their representation in LSP lexicographic resources. Issues and options concern- ing the extraction of collocations and contexts are also addressed. Keywords: phraseology, specialised collocations, complex collocations, technical texts, LSP dictionary 1 Introduction The present article deals with the topic of specialised collocations, with the aim of discussing the distribution of constituents of collocations in technical texts and the way in which this relates to a notion of context that ranges from the immediate surroundings of a node to the inclusion of a much larger portion of text. In order to do so, we will start with a rather broad concept of context as “a frame […] that surrounds a [focal] event being examined and provides resources for its appropri- ate interpretation” (Goodwin and Duranti 1992: 3; cf. also Goffman 1974 for the notion of frame) but without indulging in further distinctions between context and co-text (cf. Lyons 1995) of collocation constituents. In this way we want to contribute to research on specialised phraseology by presenting an overview of different types of collocational span (Evert 2009), by examining their interplay Laura Giacomini, Innsbruck University, Innsbruck, Austria, Laura.Giacomini@uibk.ac.at
  • 2. 56 Laura Giacomini with contextual properties and by discussing their impact on the (re)presentation of specialised collocations in LSP lexicographic resources. Despite being a comparatively underrepresented topic in terminology research, the study of specialised phraseology can provide valuable insight into the role of phrasal context in terminology (cf. Sinclair and Carter 2004 on the phrasal nature of language). As pointed out in Giacomini et al. (2020), complex collocations in particular should be the focus of attention in general and spe- cialised lexicography because of their high productivity and key phraseological significance (cf. also Gouws 2015: 184–185). The contextual behaviour of specialised collocations will be exemplified by English and Italian terms in the domain of photovoltaic technology. This domain has a rich terminology which interfaces with several disciplines, making special- ised collocations a varied phenomenon in their denotative content and termino- logical composition. Section 2 provides an operational, global definition which encompasses specialised collocations and multi-word terms and introduces a typology of specialised collocations, ranging from simple collocations to differ- ent forms of complex collocations. The typology reflects the formation processes of shorter or longer collocational sequences in technical texts. The contribution continues in Section 3 with the description of the English and the Italian corpora and of the behaviour of specialised collocations in context. The corresponding notion of context is inferred from the observation of corpus data and involves all the distinct textual components (e.g. phrase and sentence) in which the constituents of a specialised collocation may appear in a corpus. Accordingly, a typology of contexts is presented and discussed – also from the point of view of their identification in a corpus. Section 4 explores possibilities of presentation for the different collocational contexts in LSP lexicography and provides an entry draft for a dictionary on pho- tovoltaic terminology. The entry structure covers all available information con- cerning the contextual behaviour of the collocations of the lemma and proposes a method for systematically ordering this information. The article concludes with some reflections on the relevance of the results of the study and on the need to continue to explore specialised collocations in further domains in order to fully appreciate the role they play in shaping the context of specialised texts. 2 The nature of specialised collocations In general, an exclusive orientation of LSP collocation theory towards the col- location understanding typical of LGP phraseology appears problematic for a
  • 3. The contextual behaviour of specialised collocations 57 number of reasons, among which is the distribution of idiomatic phrasemes1 in specialised texts. The classification of phraseological data based on the prin- ciple of ‘idiomaticity’ seems to play a different role in LSP phraseology, where specialised collocations are often assigned to the set of non-idiomatic phrase- mes (cf. Gläser 2007: 487). Idiomaticity, i.e. the gradual feature of phraseologi- cal expressions that display a discrepancy between their literal and figurative meaning (Burger 2015), has a different status in LSP than in LGP and has often been described as a non-obligatory feature of specialised phrasemes (cf. Cedillo 2004: 91) in the same way as compositionality has been described as a non-cen- tral property in terminology (L’Homme and Azoulay 2020: 153). This is supposed to have a direct impact on the flexibility of the context in which phrasemes, col- locations in particular, occur in specialised texts. The degree of ‘fixedness’ is also a criterion for distinguishing different types of specialised phrasemes. It certainly affects the contextual behaviour of the least fixed phraseological units, namely specialised collocations, that widely lend themselves to paradigmatic substitutions and syntagmatic modifications. Paradigmatic substitutions are found, for example, in the frequent phenomenon of substantive class formation (Cedillo 2004), i.e. it is often possible to identify, for a given noun in the colloca- tion, a series of alternative nouns that are members of the same semantic field (cf. example a): a) current/overcurrent/lightning/shock/… protection Syntagmatic modifiability concerns the ability of a collocation to change its struc- ture by expanding it by means of new elements (cf. example b) and/or adapting it to new inflectional forms (cf. example c). b) current protection > reverse current protection c) battery lead > battery leads A further issue is the controversial status of complex terms, in particular multi- word terms (as opposed to specialised collocations) and compounds. Some authors trace a boundary between terminology and phraseology, ascribing to multi-word terms a purely naming function (“La terminologie désigne des objets et concepts alors que la phraséologie formule des relations.”, Gouadec 1994: 173; cf. also Gläser 2007: 494), whereas the function of specialised collocations would be the description of relations. This kind of strict duality also agrees with the view of multi-word terms and specialised collocations as covering complementary 1 ‘Phraseme’ is used in this contribution as a synonym of ‘phraseological unit’ (cf. Corpas Pastor and Colson 2020: 2).
  • 4. 58 Laura Giacomini syntactic patterns, with the former as typical noun phrases (e.g. fault current, or PV array mounting structure) and the latter as typical verb phrases (e.g. install- ing a PV array or to route cables). However, as pointed out by Cedillo (2004) and Giacomini (2021), separating word combinations based on their syntactic features leads to inconsistencies when trying to explain the semantic equiva- lence of variants of the kind V+(P)+N (protect against overcurrent) and N+(P)+N (overcurrent protection or protection against overcurrent). Among complex terms, compounds are, according to some researchers (cf. Burger 2015: 16), equally problematic from a phraseological standpoint since they do not meet the fun- damental criterion of polylexicality of phrasemes. A broader definition of ‘pol- ylexicality’ as a combinatorial property of lexical morphemes, not exclusively of words, would help overcome this obstacle and reflect the natural semantic equivalence of some compounds, e.g. (sth is) roof-mounted, and collocations, e.g. mount (sth) on the roof. In the context of this study, we will start from an operational definition of specialised collocation aimed at a lexicographic description of the phenomenon. By specialised collocation we mean a combination of two or more words that is typical of a specialised language, with unitary phraseological meaning and terminological character. The terminological character of the collocation is independent of the terminological or non- terminological character of its individual constituents. The degree of idiomaticity of the specialised collocation is variable, as is the degree of its fixedness. This is a comprehensive definition, which, relying on a phraseological but also empirical notion of collocation (cf., among others, Evert 2009 and Bartsch 2004), is aimed at providing a framework for the treatment of specialised collocations in a lexicographic resource. It allows considerable flexibility in the treatment of data extracted from corpora without precluding further restrictions, e.g. of multi- word terms in the narrow sense. 2.1 Simple and complex specialised collocations: a typology The notion of collocation as a binary combination is still very much entrenched in both common language and special language phraseological studies. The understanding of collocations as n-ary combinations has gained some popu- larity on the basis of corpus evidence in the recent past (cf., among others, Seretan 2013; Gouws 2015; Tutin and Kreif 2016) but the nature of complex collocations is still largely unexplored. This is very limiting with respect to the actual syntactic and semantic scope of the phenomenon, and it also con- strains the possibility of carrying out adequate contextual analysis. This view is
  • 5. The contextual behaviour of specialised collocations 59 likewise reflected in the treatment of specialised collocations in lexicographical and terminographical resources, in which the focus is primarily on two-word combinations. In order to best analyse the context in which the constituents of a special- ised collocation develop, it is necessary to introduce a typology of collocations that is not tied to a specific number of constituents or to specific restrictions on the context itself. The notion of possible context for specialised collocations will be inferred from the data we are going to examine and will not be defined in advance. By ‘simple specialised collocation’ (SSC) we mean a base collocation, terminologically not further decomposable and usually consisting of two ele- ments. As already pointed out, the constituents of any specialised collocation may or may not be terms, without thereby affecting the terminological character of their co-occurrence. Examples of SSCs in English are: photovoltaic system poly-crystalline junction box grid-connected system load to match the voltage amount of energy to charge a battery It is apparent from specialised texts, however, that also complex collocations play a role in the phraseology of a language, although they have been only margin- ally investigated in the past (cf. Giacomini et al. 2020). A study conducted in the context of learner’s lexicography on general language complex collocations in Italian and German has revealed the existence of two different kinds of complex collocation formation (ibid.): – The first type of complex collocation is built by recursive expansion. The recursive nature of collocations, described by some researchers as the prop- erty of collocation constituents to be collocational themselves (cf. Heid 1994; Seretan 2013), implies that a core collocational phrase is progressively expanded by the addition of new collocates. – The second type of complex collocation is built by argument complementa- rity, i.e. the concatenation of simple collocations of a verb matching two or more of its arguments. This type of formation is sometimes combined with the first one, depending on the collocational range of the constituents of simple collocations. The hypothesis to be tested in this contribution is that the model developed in the study on general language is transferable to specialised language and is also valid for English. The application of the descriptive model of complex col- locations to special language corpora will then serve to explore their contextual
  • 6. 60 Laura Giacomini features. A ‘complex specialised collocation’ (CSC) will be defined as a spe- cialised collocation derived from a SSC. Since its terminological value must by definition be preserved in the evolution from a simple collocation, in special- ised languages we usually find a lower number of complex collocations than in general language, where the only valid criterion for defining a complex collocation is syntactic, semantic and phraseological typicality. Based on the previously mentioned formation types for complex collocations, we will distin- guish also for LSP between ‘recursively built’ CSCs and ‘argument-related’ CSCs. Table 1 summarises the profiles of the discussed phrasemes and provides some examples in English and Italian. Tab. 1: Types of simple and complex specialised collocations and examples in English and Italian. For each CSC, the original SSC is shown, and the additional constituents in a) are underlined Specialised collocation: Main characteristics: English examples: Italian examples: Simple specialised collocation (SSC): Not further decomposable collocation, usually binary. renewable energy (SSC) energia raggiante (SSC) Complex specialised collocation (CSC): a) recursively built CSC b) argument-related CSC Expansion by the addition of a collocate, usually a modifier/ specification of a constituent; can also apply to a CSC. Concatenated collocations matching the arguments of a verb. match the voltage (SSC) match the nominal voltage (CSC) match the nominal voltage of the solar array (CSC) PV system (SSC) performance of the PV system (CSC) install + solar panel + roof to install (V) a solar panel (SSC, direct object) on a roof (locative) (CSC) modulo al silicio (SSC) modulo al silicio monocristallino (CSC) circuito aperto (SSC) tensione di circuito aperto (CSC) radiazione solare + incidere + superficie radiazione solare (SSC, subject) incidente (V) su una superficie (locative) (CSC)
  • 7. The contextual behaviour of specialised collocations 61 As highlighted by the examples in the table, argument-related complex colloca- tions can be constructed not only around verbs but also other word classes that hold arguments (e.g. nouns such as performance and modulo). The phraseological and terminological character of a collocation can be understood as a continuum that fades as the collocation expands and includes new constituents. This is also evidenced by the results of the extraction of terms and collocates, with candidates becoming less and less frequent and less and less specific according to association measures as the collocations expand. Therefore, a limit to the scope of the collocations under analysis will not be set in advance, since this limit will be assessed in each case based on corpus data. This aspect also plays a role in the presentation of collocations and contexts in LSP lexico- graphic resources (see Section 4). 3 Constituents of specialised collocations and their contextual behaviour The present study is corpus-based, since the context types of specialised colloca- tions are verified in a corpus according to the previously defined typology. Two small comparable corpora, Photovoltaics2021_en and Fotovoltaico2021_it, have been created for English and Italian, each comprising around 800,000 words and made up of handbooks and guidelines concerning the field of photovoltaic tech- nology. The texts, which have been manually collected among online resources, are addressed to technicians and prospective technicians and cover the topics of design and installation of photovoltaic systems. From a terminological point of view, corpus texts are characterized by a high degree of specialisation in both languages. The language of the photovoltaics domain, as part of the language related to renewable energies (e.g. hydroelectricity and wind power) is quite rich and com- bines terminology from base disciplines (e.g. physics, photochemistry, electro- chemistry), as well as from sister disciplines (e.g. the construction industry). As a result, the collocations themselves are quite varied in their denotative content and terminological composition. The English corpus shows a considerable amount of nominalization, verb forms are often passivised, and synthetic struc- tures like N-V or N-A compounds (e.g. roof-mounted, air-permeable) are frequent, as is fairly typical in technical texts.
  • 8. 62 Laura Giacomini The two corpora have been compiled in Sketch Engine (Kilgarriff et al. 2014) using the previously selected texts. Sketch Engine was also been used for the extraction of specialised collocations and the analysis of their contexts. Data are collected and examined first for English, and the overall result is then compared to the one obtained by analysing the Italian data. The aim is not to identify exact correspondences between the two languages at the level of distribution of dif- ferent specialised collocation types and corresponding contexts, but to test the applicability of the typology and method to different languages. The first step of the applied procedure involves the extraction of simple and complex terms using the Keywords tool. Candidate term lists are obtained based on the comparison between the two specialised corpora and the general lan- guage reference corpora English Web 2020 (enTenTen20) for English and Italian Web 2016 (itTenTen16) for Italian. They are then assigned a keyness score by the Simple maths method. 3.1 Specialised collocations and their contexts in the English corpus From the list of candidate terms extracted from the English corpus, the most rel- evant single-word and multi-word terms2 are validated and ten of them selected considering both the relative frequencies in the focus corpus and the reference corpus, as well as their keyness score. Single-word terms have been chosen in such a way that they are not constituents of the multi-word terms at the same time. The selected terms are shown hereunder in alphabetical order: (1) amp-hour (3) deep cycle (5) electrical installation (7) grid-connected (9) junction box (11) operate (13) PV (15) shading (17) string fuse (19) voltage drop (2) charge controller (4) efficiency (6) fault current (8) inverter (10) maximum power (12) photovoltaic system (14) roof (16) solar cell (18) surge (20) wattage Starting from this list, the contexts of the extracted terms and their collocations are examined. For this purpose, the Word Sketch and the Multiword Sketch tools 2 According to the terminology used in Sketch Engine.
  • 9. The contextual behaviour of specialised collocations 63 are used in combination with concordance analysis. The different types of con- texts emerging for the collocations of the selected terms will now be presented and discussed by illustrating selected examples. 3.1.1 Embedded constituents in a phrase context A new constituent is embedded within the collocation. The context is typically the restricted context of the noun phrase (less frequently: verb, adjective, prepo- sitional phrase) in which the expansion occurs by means of single adjectival or adverbial modifiers and nouns. The result is a CSC built recursively from a SSC or a previous CSC. Embedded constituents (in parentheses) are found in the follow- ing CSCs: (1) amp-hour: (total) amp-hour demand; (2) charge controller: ((MPPT) solar) charge controller; (3) deep cycle: deep cycle (battery); (4) efficiency: (power) conversion efficiency; (5) electrical installation: (requirements for the) electrical installation; (6) fault current: (d.c.) fault current; fault current (protection); (7) grid-connected: grid-connected ((PV) system); (10) maximum power: maximum power (output); (12) photovoltaic system: (roof-mounted) photovoltaic system; (connect a) photovoltaic system; photovoltaic (hybrid) system; (13) PV: (solar) PV system, (off-grid) PV system, ((grid-connected and) stand-alone) PV system, (installation of a) PV system; PV (module) installation; (thin film) PV cell, PV cell (material), (crystalline) PV cell; (above roof) PV array; (14 roof: (asymmetric) duopitched roof; (17) string fuse: (removable) string fuse; (18) surge: surge suppression (device); (20) wattage: (combined) rated wattage These phrasal contexts appear to be relatively stable, even though these collocations are sometimes expanded by the addition of free, non-collocational constituents according to the principle of syntagmatic modifiability described in Section 2. Here are a few examples of this phenomenon, in which additional, non-collocational items have been highlighted by means of square brackets: – maximum allowable residual load [available] for the solar array, – [many] stand-alone inverters, – [total] energy demand per day, – [same] nominal voltage, – load of the [existing] roof covering,
  • 10. 64 Laura Giacomini – [advantages of both] lead-calcium and lead-antimony design, – [quality of these] renewable technology systems, – silicone or [other] mastic sealant. It should be noticed, however, that constituents of word combinations character- ised by a strong conceptual cohesion are not separated by the addition of free, non-collocational items. Such units of meaning are, for instance, – junction box, – fault current, – voltage drop, – short circuit, – altitude correction factor, but also the ones mentioned in the previous examples. 3.1.2 Argument-related constituents in a sentence (or VP) context This type of phenomenon corresponds to argument-related CSCs, with colloca- tions matching one or more arguments of a verb predicate. The context is there- fore typically the verb phrase or the sentence. This phenomenon can also apply to nominalisations that maintain the argument structure of the original verb (cf. Section 3.2). Argument-related constituents are found in the following examples: (6) fault current: prevent (d.c.) fault current; (8) inverter: disconnect the inverter from the grid; (9) junction box: wire the junction box; (10) operate: interrupter operates correctly; (12) photovoltaic system: connect a photovoltaic system; (13) PV: size a ((grid-connected and) stand-alone) PV system, install a PV system, (electri- cally) connect a PV array; (14) roof: mount on a (pitched) roof; (15) shading: calculate the shading factor; (16) solar cell: (photovoltaic) solar cell produces X volts; (18) surge: protect from power surges; (19) voltage drop: minimise voltage drop As previously pointed out, constituents of argument-related CSCs can sometimes be recursively expanded depending on their collocational range (cf. prevent fault current prevent d.c. fault current). In the abovementioned examples, embedded constituents are again indicated in parentheses.
  • 11. The contextual behaviour of specialised collocations 65 The described collocational and contextual behaviour is independent of the nature of the verbs involved. They are usually relatively general technical verbs that are very common in technical sublanguages and apply to a large number of entities, such as operate, minimise, prevent, disconnect, connect, install, produce, mount, charge, or cable. They primarily indicate dynamic situations, namely pro- cesses and events (Lyons 1977: 483). A few others, such as earth, ground, overload, insulate, or retrofit seem to be more specific but not exclusive of the terminology of photovoltaic systems. A further class of verbs which are present throughout the corpus are generic, non-technical verbs, such as make, bring, use, or provide. We can observe that the most frequent collocational structure is formed by the verb and a direct object. This reflects a typically neutral style of specialised technical communication, obtained by means of passivisation or other imper- sonal forms. A neutral style, however, can also be produced by subject-verb col- locations such as the following: – the interrupter operates correctly, – fuses are not likely to operate under short-circuit conditions, – when the earth fault interrupter operates, an alarm shall be initiated. Whenever more than one argument of a verb is collocational in nature, concat- enated, adjacent collocations are built, like in disconnect the inverter from the grid or photovoltaic solar cell produces X volts. Further examples of this kind are: – ventilation prevents excessive heat build-up, – d.c. isolator may be incorporated into the inverter, – conductors should be suitably protected from mechanical damage, – blocking diodes should be used in addition to string fuses, – the amount of sunlight falling onto the face of the PV cell affects its output, – the amount of energy produced by the array per day. In some cases, coordinate verbs are able to build parallel CSCs: – PV systems mounted above or integrated into a pitched roof. Generally speaking, the displayed contexts appear to be looser than the ones found in Section 3.1.1. The connection between a verb and its arguments is some- times interrupted by non-collocational items appearing within the context of the sentence. Some examples will now be mentioned: – the amount of sunlight [hitting the array] [also] varies with…, – the PV array is [typically] mounted on fixed racks,
  • 12. 66 Laura Giacomini – all d.c. constituents must be rated, [as a minimum], at Voltage: Voc(stc) x 1.15, – manual load switching is [sometimes] provided, – direct or diffuse light [(usually sunlight)] [shining on the solar cells] induces the photo- voltaic effect. Additional, non-collocational items have been highlighted by means of square brackets. They are, for instance, appositives, adverbials, or participle construc- tions with the function of a relative clause. 3.1.3 Remote constituents beyond the sentence level Alongside embedded and argument-related constituents, which, as we have seen, correspond to different types of phrases and are more or less fixed in nature, we have postulated that there are also broader contexts, above the sentence level, in which the constituents of specialised collocations can be distributed. Indeed, we assume that specialised discourse, as it develops in a text, becomes homoge- neous through textual cohesion and coherence (cf. De Beaugrande and Dressler 1981; Adamzik 2014). We attempted to test whether there exist in a text possibili- ties and modalities of distribution of lexical constituents of phrasemes beyond the sentence level, without, for the moment, investigating the causes of such dis- tribution.3 In doing this, we expand the analysis of collocations from the initial set of 20 combinations listed at the beginning of this section to further validated combinations, in order to obtain a broader picture of the phenomenon. By observing the behaviour of simple and complex specialised collocations in the English corpus, we notice specific patterns of use according to which at least one constituent is explicitly or implicitly echoed in different sentences, sometimes interspersed with further sentences. This is an anaphorical repetition associated with the collocative nature of some terms. The following phenomena have been identified: (a) A constituent of a simple or complex collocation is explicitly repeated as such in subsequent sentences, in which it is associated with further collocations, as in the first example below, while the anaphorical character of the second item is signalled by the use of the determiner the in the second example: A charge controller is connected in between the solar panels and the batteries. The charge controller operates automatically and ensures that the maximum output of the solar panels is directed to charge the batteries without overcharging or damaging them. 3 ‘Distribution’ is generically used to indicate the position and arrangement of collocation con- stituents throughout corpus texts, not in the sense of distributional semantics.
  • 13. The contextual behaviour of specialised collocations 67 The inclination (or pitch) of the array is to be measured or determined from plan. The required value is the degrees from horizontal. Hence, an inclination of 0° represents a hori- zontal array; 90°represents a vertical array. This happens very often in list-based text structures: The approach is as follows: 1. Establish the electrical rating of the PV array in kilowatts peak (kWp) 2. Determine the postcode region 3. Determine the array pitch 4. Determine the array orientation 5. Look up kWh/kWp (Kk) from the appropriate location specific table 6. Determine the shading factor of the array (SF) according to any objects blocking the horizon – using shade factor procedure set out in 3.7.7 (b) A usually personal or demonstrative pronoun or adjective refers back towards the constituent of a simple or complex collocation in a preceding sentence, and introduces a new collocation of that constituent. PV specific plug and socket connectors are commonly fitted to module cables by the manu- facturer. Such connectors provide a secure, durable and effective electrical contact. They also simplify and increase the safety of installation works. Battery Backup Inverters: These are special inverters which are designed to draw energy from a battery, manage the battery charge via an onboard charger, and export excess energy to the utility grid. These inverters are capable of supplying AC energy to selected loads during a utility outage and are required to have anti-islanding protection. …each layer extracts energy from each photon from a particular portion of the light spectrum thatisbombardingthecell.ThislayeringofthePVmaterialsincreasestheoverallefficiency… This second modality seems to be less frequent in the specialised texts that make up our corpus than type (a). For the sake of communicative clarity, the repeti- tion of terms seems to be preferable to that of pronouns with anaphoric function, especially in distinct sentences, in which the distance between the pronoun and the antecedent reference could easily lead to semantic ambiguities. (c) A constituent of a simple or complex collocation is explicitly reiterated after one or more sentences. The intention does not seem properly anaphoric, yet the same collocational constitu- entisfoundindifferentsentenceswithoutstrongconnectionatthelevelofdiscourse. Where the array frame is mounted on a domestic roof or similar, the likelihood of the frame being an extraneous-conductive-part is very low – due to the type and amount of material used between the ground and the roof structure (which will mainly be non-conductive). Even in the case of an array frame being mounted on a commercial building where mostly steel- work is used, it is likely that the frame will be either isolated, …
  • 14. 68 Laura Giacomini (d) In some cases, one constituent is implicitly reiterated in a later context, in which another constituent of the collocation appears. In the example mentioned below, battery capacity is a specialised collocation that can be identified by observing the structure of the context. We will also call this phenomenon a subtype of anaphora, since the reprise of battery in e. Capacity is just implicit. Battery Inputs and Specifications a. Days of storage desired/required = 7 days b. Depth-of-discharge limit (typical value) = 0.8 c. Make/ Model = Exide 6E95-11 (Deep cycle battery) d. Battery cell voltage = 12 V e. Capacity = 478 Amp-hour (Ah) f. System voltage (battery bus voltage) = 24 V g. Battery round trip efficiency = 0.85 for efficiency batteries. These different forms of phraseological behaviour ‘distributed’ over several sen- tences seem to correspond to the typical structures of the textual genre in ques- tion, as well as to the typical contents and modes of technical writing, including descriptions of methods, processes and their individual steps, the repeated use of lexical elements in contiguous or distant sentences, and the schematic style of lists. 3.1.4 General remarks on the analysis of the context of specialised collocations As indicated at the beginning of the section, specialised collocations have been extracted using the Word Sketch and the Multiword Sketch tools of the Sketch Engine, progressively widening the scope of the analysed text section, but always remaining within the sentence boundaries. This allows, in particular, – to study the behaviour of recursively built and argument-related CSCs, – to highlight their combinability, and – to observe that recursively built CSCs typically correspond to multi-word terms in the strict sense. Contextual analysis of the constituents of specialised collocations within a sen- tence reveals unsurprising regularities. As soon as larger portions of the text are considered, however, the picture becomes considerably complicated. Due to their textually irregular distribution, remote constituents beyond the sentence level are clearly more difficult to be detected in the corpus than embed- ded and argument-related constituents. A first phase of the analysis has been
  • 15. The contextual behaviour of specialised collocations 69 carried out manually on a part of the texts in order to identify regularities in the contextual behaviour of the specialised collocations. In a second phase, the observations made have been applied to a semi-automatic procedure, with vali- dated simple collocates searched within textual structures larger than the sen- tence, i.e. paragraphs and documents, by means of the Corpus Query Language, such as in the following example: “junction” “box” []* “junction” “box” !within s/ The queries are particularly challenging when dealing with anaphorical pro- forms, which are not easy to predict. In comparison with the results obtained by means of Multiword Sketches, no better results have been achieved by analysing collocation graphs through the #LancsBox tool (Brezina et al. 2020; cf. also Baker 2016; Brezina 2018a). We have not focused on quantitative data for the time being, as we believe that the frequency of use of the latter phenomenon is closely linked to the termi- nology of the specialised field and the conventions of the textual genre, rather than to strictly phraseological factors. We have therefore limited ourselves to observing the types of contexts by describing them from a qualitative point of view. A quantitative analysis, on the other hand, may be of interest for comparing the two corpora and drawing preliminary conclusions on the contextual behav- iour of specialised collocations in English and Italian (see Section 3.2). In Giacomini et al. (2020), the notion of ‘conceptual range’, which refers to the syntactic level at which the concept of a complex collocation is encoded, was introduced. Complex collocations built by recursive expansion retain the same properties as the simple collocations from which they originate. A noun phrase, for instance, is expanded into a larger noun phrase by the addition of an adjecti- val modifier, or a verb phrase is expanded into a larger verb phrase by the addi- tion of an adverbial modifier. The concept encoded by the complex collocation is specified at the phrase level. Concepts covered by argument-related complex collocations, on the contrary, are encoded at sentence level (or at least at verb phrase level). This level is also able to identify complex ‘scenes’ when all syntac- tic arguments of a verb are involved in a sequence of collocations. As shown by the previous analysis of specialised collocations, the idea of conceptual range is also applicable to terminology and is useful for inferring a notion of collocation context from the data. The context of specialised colloca- tions is thus understood as a frame that surrounds a focal event (Goodwin and Duranti 1992) and, specifically, as the portion of text in which components of spe- cialised collocations appear while still being perceived as a phraseological unit, which varies both with the manner of expansion from simple to complex colloca- tions and with the explicit or implicit anaphoric resumption of constituents from one sentence to subsequent sentences.
  • 16. 70 Laura Giacomini 3.2 Comparative application to the Italian corpus The Italian corpus of texts on photovoltaics is comparable in size and composi- tion with the English corpus. It has been surveyed for the same phenomena as described in Section 3.1. Analysis has been carried out on the following set of terms, listed in alphabetical order: (1) cella fotovoltaica (3) diodo di bypass (5) fonti rinnovabili (7) FV (9) impianto fotovoltaico (11) irraggiamento voltaggio (13) massima potenza (15) nominale (17) radiazione solare (19) silicio (2) corrente continua (4) energia elettrica (6) fotoelettrico nominale (8) generatore fotovoltaico (10) installare (12) kW (14) modulo fotovoltaico (16) ombreggiamento (18) retrofit (20) voltaggio Table 2 presents some examples for each category of context. Tab. 2: Examples of contextual phenomena regarding specialised collocations in the Italian corpus Embedded constituents in a phrase context potenza nominale variabile cella fotovoltaica al silicio fonti rinnovabili tradizionali punto di massima potenza energia elettrica e termica potenza di … kW sistema FV autonomo voltaggio di funzionamento radiazione solare al suolo effetto fotoelettrico della luce solare Argument-related constituents in a sentence (or VP) context massimizzare l’irraggiamento solare convergere la radiazione solare su una cella fotovoltaica montare un diodo di bypass generare energia elettrica misurare la corrente continua all’uscita dal generatore fotovoltaico integrare un modulo fotovoltaico nella copertura progettare e installare un impianto fotovoltaico irraggiamento su superficie inclinata fenomeni di ombreggiamento del campo fotovoltaico applicazione retrofit in facciata produzione di energia elettrica da fonti rinnovabili
  • 17. The contextual behaviour of specialised collocations 71 Remote constituents beyond the sentence level Lo schema sintetizza le possibili configurazioni che caratterizzano un impianto FV. In esso sono presenti cinque insiemi, composti ciascuno da diversi elementi, che in varie configurazioni caratterizzano le tipologie di impianto. Per quanto riguarda la tecnologia, la quota di produzione di celle al silicio è in crescita e resta la predominante con il 94,2% del totale prodotto. Il silicio multicristallino con il 56,9% del mercato risulta essere il più utilizzato rispetto al monocristallino, all’amorfo e al film sottile. Tuttavia, nuova spinta sta avendo il silicio mono-cristallino […]. Prima di eseguire le misure si consigliano i seguenti controlli: – verificare che ci siano condizioni di irraggiamento stabili e che non ci siano nuvole bianche in un cono di 60° di apertura intorno al sole che possano rendere instabili le misure di radiazione solare; […] – evitare di fare verifiche tecniche-funzionali nelle giornate afose, al crescere del contenuto di umidità nell’aria aumenta la constituente di radiazione diffusa e di conseguenza il rendimento del campo fotovoltaico è più basso; un semplice espediente per capire se si è in presenza di umidità eccessiva nell’aria è quello di osservare la colorazione del cielo: se questo è di un bel blu la radiazione diffusa è molto bassa, più il colore del cielo tende al bianco più la constituente diffusa è elevata. […] – verificare che ci sia una radiazione superiore a 600 W/m2; […] The three context types of collocational constituents are widely present in both languages. The length of the chains of embedded CSCs is as variable as that of argument-related CSCs, which often form sequences of adjacent collocations for verbs with multiple arguments such as integrare, convergere or montare. Even at the level of remote constituents located in distinct sentences, we do not notice any obvious difference. However, a difference seems to emerge precisely in the case of verbal argument structures: much more frequently than in the English corpus, these structures are transferred to verb nominalisations. This is, for example, the case of irraggiamento su superficie inclinata (corresponding to the verbal expression: irraggiare su superficie inclinata) or produzione di energia elettrica da fonti rinnovabili (corresponding to the verbal expression: produrre energia elettrica da fonti rinnovabili) (cf. Daille 2017 for an overview of syntactic variation of this kind). Finally, a brief quantitative analysis has been conducted to compare the col- locational data in the two languages. The analysis has been applied in the two languages to the 20 simple and complex terms of reference already illustrated. Tab. 2: (Continued)
  • 18. 72 Laura Giacomini For the embedded and argument-related constituents, the calculated value was the maximum number of constituents found for the collocations of a certain term, according to the following scheme: term: PV two-word collocation: PV system three-word collocation: stand-alone PV system [embedded]; size a PV system [argument- related] four-word collocation: grid-connected and stand-alone PV system [embedded]; algorithm sizes a PV system [argument-related, adjacent] five-word collocation: algorithm sizes a stand-alone PV system [embedded + argument- related, adjacent] … It is not useful to distinguish the two types of contexts, since, as shown in the last word combination of the above example, embedded and argument- related (sometimes adjacent) constituents are frequently mixed. A maximum of five constituents has been tested: beyond this limit, no collocational combi- nations have been found for the selected terms. In addition, as the number of constituents increases, so does the difficulty of extracting candidate combina- tions automatically, as their uniqueness in the corpus increases and they are no longer detected by the system as collocation candidates. Table 3 displays our results. Tab. 3: Comparison between the English and the Italian corpus for what concerns the distribution of embedded and argument-related constituents of specialised collocations. The context of refence is the phrase as well as the sentence level Number of constituents in a specialised collocation: 2 3 4 5 EN 2/20 (10%) 10/20 (50%) 6/20 (30%) 2/20 (10%) IT 2/20 (10%) 13/20 (65%) 4/20 (20%) 1/20 (5%) For constituents of the remote type, i.e. traceable in different sentences, the choice has been made to calculate the distance, in terms of sentences, between anaphoric pairs within the discourse, focusing on the repetition of a collocative constituent as such or through a pro-form. These two cases have not been distin- guished from each other. Since the same anaphoric pair can be found at differ- ent distances at different points in the corpus, it can be accounted for more than once. Table 4 shows the results of this second quantitative assessment.
  • 19. The contextual behaviour of specialised collocations 73 Tab. 4: Comparison between the English and the Italian corpus for what concerns the distribution of remote constituents of specialised collocations4 Distance in terms of number of sentences: 0 1 2 ≥2 EN 8/20 (40%) 15/20 (75%) 10/20 (50%) 9/20 (45%) IT 7/20 (35%) 18/20 (90%) 10/20 (50%) 8/20 (40%) The amount of data observed is too small to draw relevant conclusions, but it helps to hypothesise trends that could be tested in the future. From this point of view, it is useful to look at the percentage data in the two tables for the two languages, which show very similar results. In both languages, though with a slight predominance for Italian, the number of constituents of a specialised collocation found most often within the sentence is three, followed by four. Above the sentence level, most col- locational constituents tend to be repeated in the next sentence or after two sen- tences. Less than half of the selected terms are not subject to any type of anaphoric repetition; nearly half occur in more distant sentences. The observations made so far on the contextual data of specialised collocations will now be used to make con- siderations on the treatment of collocational contexts in specialised dictionaries. 4 Presenting collocational context in LSP dictionaries In this section we will focus on possible ways of presenting the different colloca- tional contexts in LSP dictionaries, providing guidelines for implementing obser- vations made on corpus data. As pointed out by Gouws (2015: 184), “the inclusion of complex collocations remains important and lexicographers should negotiate the best possible way of presenting them and of making users aware of their exist- ence”. As a consequence, this need also involves the presentation of collocations in different contexts. In existing LSP resources the focus of presentation gener- ally falls on predominantly binary specialised collocations, for which usually no context is given or, at most, some usage examples are provided. 4 The context of refence is beyond the sentence level. The distance in terms of sentences has to be understood as follows: 0 = a constituent is not found in a different sentence, 1 = a constituent is found in the next sentence, 2 = a constituent is found two sentences later, and so on.
  • 20. 74 Laura Giacomini The variety of contexts brought to light by our analysis makes us reflect on the need to give these phenomena greater weight in lexicography. Providing the dictionary user with detailed data on the contexts of use of specialised colloca- tions supports with high probability the textual production function of the dic- tionary. These data can be located within the microstructure of the dictionary in a dedicated section or be systematically substituted for generic usage examples. Based on the example of PV system, a very frequent collocation from the English corpus, an entry draft will now be presented in which the lexicographic items related to contextual knowledge will be highlighted (Table 5). PV system serves in this entry as a lemma, although the term could alternatively be pre- sented as a collocation of the lemma PV together with other collocations such as PV cell, PV module and PV array. Tab. 5: Entry draft for the term PV system containing lexicographic items related to the contextual properties of the term PV system n. (↑PV, photovoltaic) DEFINITION: A photovoltaic (PV) system is a technology that converts solar radiation into electric current. […] COLLOCATIONS IN CONTEXT: – PHRASE LEVEL NOUN PHRASE w/ PRE-MODIFIER: grid-connected PV system stand-alone PV system solar PV system Without batteries, a grid-connected PV system will shut down when a utility power outage occurs. [Bhatia: Course] ‣[further examples] PREPOSITIONAL PHRASE / COMPOUND: PV system of … kWp ≈ … kWp PV system MPPT (Maximum Power Point Tracking) for a PV system design of a PV system ≈ PV system design → to design installation of a PV system ≈ PV system installation → to install
  • 21. The contextual behaviour of specialised collocations 75 Tab. 5: (Continued) components of a PV system ≈ PV system components PV system efficiency PV system performance Batteries consume energy during charging and discharging, reducing the efficiency and output of the PV system by about 10 percent for lead-acid batteries. [Bhatia: Course] ‣[further examples] VERB PHRASE: to install a PV system (on a roof) → installation to design a PV system → design (an algorithm) sizes a PV system a PV system generates power a PV system delivers power When designing the PV system, potential problems such as sulphation, stratification and freezing should be considered and avoided. [Bhatia: Course] ‣[further examples] – DISCOURSE LEVEL It is generally accepted that the installation of a typical roof-mounted PV system presents a very small increased risk of a direct lightning strike. However, this may not necessarily be the case where the PV system is particularly large, where the PV system is installed on the top of a tall building, where the PV system becomes the tallest structure in the vicinity, or where the PV system is installed in an open area such as a field. [eca: Installation guide.] Solar PV systems require minimal maintenance, as they do not usually have moving parts. However, routine maintenance is required to ensure the solar PV system will continue to perform properly. [eca: Handbook.] Before starting any PV system testing: (hard hat and eye protection recommended) 1. Check that non-current carrying metal parts are grounded properly. (array frames, racks, metal boxes, etc. are connected to the grounding system) 2. Ensure that all labels and safety signs specified in the plans are in place. 3. Verify that all disconnect switches (from the main AC disconnect all the way through to the combiner fuse switches) are in the open position and tag each box with a warning sign to signify that work on the PV system is in progress. [CEC: Installation guide.]
  • 22. 76 Laura Giacomini This study has shown that specialised collocations form a continuum of con- stituents that fit into contexts of varying length (cf. also Wahl and Gries 2018 for a study of multi-word expressions of increasing length). If the collocational range (McIntosh 1966) of a specialised term is exhausted after a certain number of con- stituents, as can be inferred from the results presented in Section 3.2, it is neces- sary to establish, early in the lexicographic process, what the spatial limit in the representation of the phraseological continuum in question can or should be. From the perspective of textual production both in the mother tongue and in the foreign language, as well as of ‘active’ translation, the availability in the dictionary entry of typical contexts, more or less extended depending on the loca- tion, can be crucial. It is reasonable to assume, therefore, that flexibility in the coverage of such contexts is beneficial. Moreover, the typicality of the contexts can be measured in terms of frequency and strength of association in the corpus, obviously adapting the statistical validation thresholds of the candidate colloca- tions as one gradually moves on to more extensive and thus per se (much) less frequent combinations. The proposed microstructure contains a specific search zone dedicated to the different contexts of use of specialised collocations. Thinking of the ideal user of the LSP dictionary as a translator or technical writer with good metalinguistic skills, we have chosen to mark these contexts with syntactic labels, as shown in the abstract microstructure of the entry: PHRASE LEVEL: NOUN PHRASE w/ PRE-MODIFIER ≈ embedded constituents Collocations Example(s)/ Source/ Genre NOUN PHRASE w/ POST-MODIFIER ≈ embedded constituents Collocations Example(s)/ Source/ Genre VERB PHRASE ≈ argument-related (among which: adjacent) constituents DISCOURSE LEVEL: ≈ remote constituents Example(s)/ Source/ Genre We have chosen to indicate the various types of context by means of syntactic tags, without resorting to the corresponding terminology (e.g. embedded, argu- ment-related, remote constituents) used in this study, which might not be particu- larly user-friendly in a lexicographic environment. At the phrase level, collocates of the lemma are highlighted and accompanied by less frequent collocates indicated in round brackets (e.g. install... on a roof). Nominalisations of verbs (e.g. components of a PV system) are referenced to the corresponding verb form
  • 23. The contextual behaviour of specialised collocations 77 (PV system component) and vice versa. Equivalences between different syntactic structures are also indicated, e.g. between a noun phrase with post-modifier and a compound (e.g. design of a PV system and PV system design). At the discourse level, the emphasis is on the ways in which the collocative context is typically constructed in certain textual genres. Here, it is important to highlight paradigmatic cases of explicit (or implicit) anaphora by means of term repetition or pro-form (both underlined) with indication of the textual genre and source (in parentheses). These context examples are very broad but not generic, as they also focus on the collocative behaviour of terms. Each zone of the entry should be integrated with further (linked) corpus examples. All examples in the entry are followed by the indication of their source as well as of the textual genre. The presented microstructural model can be varied in many ways, also depending on the mode of publication. Nevertheless, it introduces elements that are essential for the description of the possible context of the specialised colloca- tions of a certain domain, such as – the subdivision of specialised collocations not on the basis of each individual syntactic structure, but of classes of contexts valid for both SSCs and CSCs; – the possibility of expanding collocations on the basis of the concrete behav- iour of the terms in the corpus, without imposing a predefined scope. 5 Conclusions This paper has focused on the role of specialised phraseology, in particular col- locations, in determining a significant part of the contexts in which domain ter- minology is used. It contributes to corpus-based research on terminology and phraseology by providing new insights into the formation and behaviour of n-ary collocations in technical texts. Different contexts of simple and complex special- ised collocations have been described. It is precisely the complex collocations that turn out to be extremely interesting from this point of view, since they are formed in accordance with different contexts. The possible contexts range from the area of the phrase to that of the sentence (around a predicate) until they cross the border of the sentence to develop in the text discourse. A notion of collocation context has been directly inferred from the analysis of corpus data: it is the portion of the text in which specialised collocation occurs while still being perceived as a phraseological unit. The context varies both with the manner of collocation expansion from a simple to a complex collocation, and with the anaphoric resumption of collocation constituents from one sentence to subsequent sentences.
  • 24. 78 Laura Giacomini Apart from single phrases, in which collocations occur and expand in virtue of syntactic-semantic restrictions and typicality, various factors seem to intervene in the distribution of collocations above the sentence level, for example textual, communicative and pragmatic factors, such as the structural coding conventions of a textual genre already mentioned in Section 3, or possibly further functional or discursive causes (cf. terminology employed by Freixa (2013) for describing causes of term variation). The limitations of the analysis carried out lie in the restricted possibilities of detecting and thus automatically extracting complex collocations as well as identifying remote components of collocations beyond the level of the individ- ual sentence. In future work, new possibilities for extracting terms related to discourse analysis (Widdowson 2008; Brezina 2018b; Loureda et al. 2019) could be explored, including the contextual role of genuinely pragmatic aspects. From a genuinely computational point of view, the application of existing meth- odologies for collocation identification such as finite state transducers associ- ated with metagraphs (Tutin 2017) as well as the analysis of word and sentence embeddings (cf., among others, Goldberg 2017, Reimers and Gurevych 2019), might complement the current method by building the ground for quantitative analysis. Further experiments in data collection and processing should be carried out in new specialist areas to assess the general applicability of the model. Likewise, the ordering strategies for lexicographic data should be further inves- tigated, varying the structure presented in this contribution according to the specific dictionary function and ideal user group, but also taking into account the possibility of covering context information concerning bilingual or multi- lingual data. References Adamzik, Kirsten. 2014. Textlinguistik: eine einführende Darstellung. Berlin: De Gruyter. Baker, Paul. 2016. The shapes of collocation. International Journal of Corpus Linguistics, 21(2). 139–164. Bartsch, Sabine. 2004. Structural and Functional Properties of Collocations in English. Tübingen: Narr Brezina, Vaclav. 2018a. Collocation graphs and networks: Selected applications. In Pascual Cantos-Gómez Moisés Almela-Sánchez (eds.), Lexical collocation analysis (Quantitative Methods in the Humanities and Social Sciences), 59–83. Cham: Springer. Brezina, Vaclav. 2018b. Statistical choices in corpus-based discourse analysis. In Charlotte Taylor Anna Marchi (eds.), Corpus Approaches to Discourse, 259–280. London New York: Routledge.
  • 25. The contextual behaviour of specialised collocations 79 Brezina, Vaclav, Pierre Weill-Tessier Anthony McEnery. 2020. #LancsBox v.5.x. [software]. http://corpora.lancs.ac.uk/lancsbox/ Burger, Harald. 2015. Phraseologie: Eine Einführung am Beispiel des Deutschen (5., neu bearbeitete Auflage). Berlin: Schmidt. Caro Cedillo, Ana. 2004. Fachsprachliche Kollokationen: Ein übersetzungsorientiertes Datenbankmodell Deutsch-Spanisch. Tübingen: Narr. Corpas Pastor, Gloria Jean-Pierre Colson. 2020. Introduction. In Gloria Corpas Pastor Jean-Pierre Colson (eds.), Computational Phraseology (IVITRA Research in Linguistics and Literature, 24), 1–8. Amsterdam Philadelphia: Benjamins. De Beaugrande, Robert-Alain Wolfgang U. Dressler. 1981. Einführung in die Textlinguistik. Tübingen: Niemeyer. Daille, Beatrice. 2017. Term Variation in Specialised Corpora: Characterisation, automatic discovery and applications. Amsterdam Philadelphia: Benjamins. Evert, Stefan. 2009. Corpora and collocations. In Anke Lüdeling Merja Kytö (eds.), Corpus Linguistics. An International Handbook (Volume 2), 1212–1248. Berlin New York: De Gruyter Mouton. Freixa, Judit. 2013. Otra vez sobre las causas de la variación denominativa. Debate Terminológico 09. 38–46. Giacomini, Laura. 2021. Phraseology in technical texts: A frame-based approach to multiword term analysis and extraction. In Carmen Mellado Blanco (ed.), Productive Patterns in Phraseology and Construction Grammar: A Multilingual Approach, 215–234. Berlin Boston: De Gruyter. Giacomini, Laura, Paolo DiMuccio-Failla Eva Lanzi. 2021. The interaction of argument structures and complex collocations: role and challenges in learner’s lexicography. In Proceedings of the EURALEX XIX International Conference, Alexandroupoli, 7–9 September, 285–293. Gläser, Rosemarie. 2007. Fachphraseologie. In Harald Burger, Gerold Ungeheuer Herbert Ernst Wiegand (eds.), Handbücher zur Sprach- und Kommunikationswissenschaft / Handbooks of Linguistics and Communication Science (HSK, Vol. 1), 482–505. Berlin: De Gruyter. Goldberg, Yoav. 2017. Neural Network Methods in Natural Language Processing. Synthesis Lectures on Human Language Technologies (April 2017). San Rafael, CA: Morgan Claypool Publishers. Goodwin, Charles Alessandro Duranti. 1992. Rethinking context: an introduction. In Alessandro Duranti Charles Goodwin (eds.), Rethinking context: Language as an interactive phenomenon, 1-42. Cambridge: Cambridge University Press. Gouadec, Daniel. 1994. Nature et traitement des entités phraséologiques. In Terminologie et phraséologie: acteurs et amenageurs; actes de la deuxième Université d’Automne en Terminologie, Rennes 2, Septembre 1993, 167–193. Gouws, Rufus H. 2015. The presentation and treatment of collocations as secondary guiding elements in dictionaries. Lexikos 25. 170–190. Heid, Ulrich. 1994. On ways words work together – research topics in lexical combinatorics. In Proceedings of the VI EURALEX International Congress, Amsterdam, 30 August–3 September, 226–257. Kilgarriff, Adam, Vit Baisa, Jan Bušta, Miloš Jakubíček, Vojtech Kovář, Jan Michelfeit, Pavel Rychlý Vit Suchomel. 2014. The sketch engine: Ten years on. Lexicography, 1(1). 7–36.
  • 26. 80 Laura Giacomini L’Homme, Marie-Claude Daphnée Azoulay. 2020. Collecting collocations from general and specialised corpora: A comparative analysis. In Gloria Corpas Pastor Jean-Pierre Colson (eds.), Computational Phraseology (IVITRA Research in Linguistics and Literature, 24), 151–176. Amsterdam Philadelphia: Benjamins. Loureda, Óscar, Inés Recio Fernández, Laura Nadal Adriana Cruz (eds.). 2019. Empirical studies of the construction of discourse. Amsterdam Philadelphia: Benjamins. Lyons, John. 1995. Text and discourse; context and co-text. In John Lyons (ed.), Linguistic Semantics: An Introduction, 258–292. Cambridge: Cambridge University Press. Lyons, John. 1977. Semantics. Cambridge: Cambridge University Press. McIntosh, Angus. 1966. Patterns and ranges. Language 37. 325–337. Reimers, Nils Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 3982–3992. Hong Kong, China, November 3–7, 2019. Seretan, Violeta. 2013. A multilingual integrated framework for processing lexical collocations. In Adam Przepiórkowski (ed.), Computational Linguistics – Applications, 87–108. Heidelberg New York: Springer. Sinclair, John McH. Ronald Carter (eds.). 2004. Trust the text: Language, corpus and discourse. London New York: Routledge. Tutin, Agnès. 2017. Annotating lexical functions in corpora: Showing collocations in context. In Proceedings of the Second International Conference on the Meaning-Text Model, 498–510. Moscow: Slavic Culture Languages Publishing House. Tutin, Agnès Olivier Kraif. 2016. From binary collocations to grammatically extended collocations: Some insights in the semantic field of emotions in French. Mémoires de la Société néophilologique de Helsinki, Helsinki: Société néophilologique de Helsinki, 2016, Collocations Cross-Linguistically. Corpora, Dictionaries and Language Teaching, 245–266. Wahl, Alexander Stefan Th. Gries. 2018. Multi-word Expressions: A Novel Computational Approach to Their Bottom-Up Statistical Extraction. In Pascual Cantos-Gómez Moisés Almela-Sánchez (eds.), Lexical collocation analysis (Quantitative Methods in the Humanities and Social Sciences), 85–110. Cham: Springer. Widdowson, Henry George. 2008. Text, context, pretext: Critical issues in discourse analysis. New York: John Wiley Sons.