Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical domain - Natalia Grabar

Context Difficulty Paraphrases Conclusion
Automatic text simplification in biomedical
domain
Natalia Grabar
STL CNRS UMR8163, France
Grammarly, Kyiv, Ukraine: 21/08/2018
1/45 Automatic text simplification in biomedical domain Natalia Grabar

Background
Lviv University
Languages, Linguistics

Background
Lviv University Master, PhD
INaLCO, Universit´e Paris 6
Languages, Linguistics NLP, Medical area, Terminology

Background
PostDoc, AHU
Inserm, Fondation HON Geneva
Information retrieval, Quality of information
Discourse analysis, Typology
Information for non-specialized users

Background
Acquisition of lexical resources
PostDoc, AHU Researcher
Inserm, Fondation HON Geneva CNRS
Information retrieval, Quality of information Information for non-specialized users
Discourse analysis, Typology Semantic annotation, Information extraction
Information for non-specialized users

Automatic text simpliﬁcation in biomedical domain
work in French
1 Context
2 Detection of diﬃculties
3 Acquisition of paraphrases
4 Conclusion

Context
Evolution of the biomedical domain:
specific knowledge and terms
Different kinds of users:
medical staff, pharmacists, students, patients...
various levels of specialization
Patients: quality of information, understanding
technicity and understanding of health information
⇒ Close relation with health and well-being of people
(AMA, 1999; Berland et al., 2001; McCray, 2005; Tran et al.,
2009)

Readability of health documents
Health information must be: readable, understandable, usable
In diﬀerent situations:
follow up of treatments
make decisions (chronical disorders)
communicate with medical doctors
make the healthcare process successful
Real diﬃculty:
understand the steps of the correct intake of drugs (Patel
et al., 2002)
within 2,600 US patients (2 hospitals):
26% to 60% cannot understand instructions on drug intake,
informed consensus, health brochures (Williams et al., 1995)
Documents, health websites designed for patients:
often show high technicity (Berland et al., 2001)

Objective
Make health documents and medical terms better
understandable by patients:
detect reading diﬃculties
propose common paraphrases for technical terms
Diagnosis
of text
modelref. ref. model res. rules
Detection of
difficult words
Simplification
/decoration
difficult
Text Simplified text
Interdisciplinary research:
linguistics, psychology, terminology, NLP...

Detection of diﬃculties
1 Context
4 Conclusion

Detection of diﬃculties (documents)
Existing work
Text typology
Diagnosis of the text readability
Classical measures: Flesch (Flesch, 1948), Fog (Gunning,
1973)...
Computational measures:
classical measures and medical vocabulary (Kokkinakis &
Toporowska Gronostaj, 2006)
n-grams of characters (Poprat et al., 2006)
manual weighting of words (Zheng et al., 2002)
morphology (Chmielik & Grabar, 2009)
stylistic criteria (Grabar et al., 2007)
discursive criteria (Goeuriot et al., 2007)
various combinations (Wang, 2006; Zeng-Treiler et al., 2007;
Goeuriot et al., 2007; Leroy et al., 2008)
...

Detection of diﬃculties (documents)
Results (Chmielik & Grabar, 2009; Chmielik & Grabar, 2011)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
lexical features morphological features
Decision trees C4.5 (Quinlan, 1993)
10-fold cross-validation

Detection of diﬃculties (words)
Existing work
Facilitators: hiphen (Bertram et al., 2011), space (Frisson
et al., 2008), morphological closeness (L¨uttmann et al., 2011),
primes (Bozic et al., 2007; Beyersmann et al., 2012), pictures
(Dohmes et al., 2004; Koester & Schiller, 2011), etc.
Morphological head (Jarema et al., 1999; Libben et al., 2003)
NLP: challenges (Specia et al., 2012):
for a short text and a given word, several possible substitutions
which satisfy the context are proposed
→ sort the substitutions according to their simplicity
Descriptors:
Google n-grams, WordNet, length of words, number syllables,
mutual information, frequency...

Psychology: eye-tracking (Grabar et al., 2018)
Eye-tracking:
recording eye movements when reading
Several indicators:
ﬁxations: periods during which the eyes are stable (visual
information is analyzed)
saccades: rapid movements of eyes to move from one point to
another
regressions: backward movements

Detection of difficulties: Eye-tracking
text1
EXAMEN : ECHOGRAPHIE DES MAINS ET DES PIEDS
MOTIF : Bilan d’arthralgies
Mains : On ne visualise pas de ténosynovite, ou d’arthrosynovite.
Avant-pieds : On retrouve des remaniements intéressant les premières
métatarsophalangiennes en rapport avec des antécédents de chirurgie d’Hallux
valgus.
Absence d’arthrosynovite au niveau des articulations métatarsophalangiennes.
EXAMEN : ECHOGRAPHIE DES MAINS ET DES PIEDS
MOTIF : Bilan de douleurs articulaires
Mains : On ne visualise pas d’inflammation des tendons, ni de la membrane
articulaire.
Avant-pieds : On retrouve des remaniements intéressants sur les premières
articulations des pieds en rapport avec les antécédents de la chirurgie de la
déformation du pied.
Absence d’inflammation de la membrane au niveau des articulations du pied.

text2
Cette patiente avait constitué un infarctus du myocarde antérieur en novembre
2010, pour lequel avait été réalisée une angioplastie de l’IVA moyenne avec
implantation d’un stent non actif Vision de 2.75 mm x 18 mm, un complément
par angioplastie au ballon seul en aval. Une endoprothèse avait également été
implantée au niveau de la circonflexe proximale, avec un stent Vision 2.5 x 18
mm. La fraction d’éjection était évaluée entre 35 et 40 %.
Nous l’avions revue récemment, en insuffisance cardiaque, avec plusieurs autres
problèmes :
- une anémie microcytaire inexpliquée,
- un déséquilibre important de son diabète pour lequel elle a été, entre temps,
prise en charge par nos confrères diabétologues.
Cette patiente avait présenté une crise cardiaque en novembre 2010, pour
laquelle avait été réalisée une intervention chirurgicale de l’artère cardiaque avec
implantation d’un stent non actif. Un autre stent avait également été implanté
au niveau d’une autre artère. La fraction d’éjection observée était basse.
Nous l’avions revue récemment, en insuffisance cardiaque, avec plusieurs autres
problèmes :
- une anémie inexpliquée,
- un déséquilibre important de son diabète pour lequel elle a été, entre temps,13/45 Automatic text simplification in biomedical domain Natalia Grabar

Results on text1

Results on text2

Results
text1 text2
O S SD p ddl t-test O S SD p ddl t-test
TRN 60,55 63,63 -3,08 0,23 45,00 1,22 62,73 59,67 3,06 0,22 45,00 1,24
CRL 58,88 62,06 -3,19 0,22 45,00 1,25 61,04 57,84 3,20 0,21 45,00 1,29
DPF 227,41 215,75 11,66 0,11 45,00 1,65 214,73 214,69 0,04 0,50 45,00 0,68
NTF 587,61 370,48 217,14 0,00 45,00 7,38 395,71 372,22 23,49 0,16 45,00 1,43
AMP 3,50 3,80 -0,30 0,02 45,00 2,44 3,33 3,82 -0,49 0,00 45,00 5,38
REG 27,26 21,21 6,06 0,05 45,00 2,05 21,47 19,30 2,18 0,24 45,00 1,18
QCM 1304,35 869,57 434,78 0,02 21,00 2,08 602,77 538,95 63,82 0,00 21,00 2,08
TRN, CRL: stable reading
DPF: no anticipation
NTF, AMP, REG: better signiﬁcance on text1
QCM: better understanding with simpliﬁed versions

Detection of difficulties: NLP
(Grabar et al., 2014)
Medical words from Snomed International (Côté et al., 1993)
29,641 lemmatized words
Manually annotated:
by 3 independent annotators:
categories:
1 I can understand
2 I am not sure
3 I cannot understand
inter-annotator agreement: Cohen’s Kappa 0.736
NLP task: supervised categorization
automatically reproduce the manual annotations: F=0.90
24 descriptors:
syntactic and morphological information, reference lexica,
frequency, length, initial and final substrings, readability
scores...

Detection of diﬃculties: NLP

Typology
abbreviations (OG, VG, PAPS, j, bat, cp);
proper names (Gougerot, Sjögren, Bentall, Glasgow, Babinski,
Barthel, Cockcroft);
drug names;
neoclassical compounds - disorders, procedures, treatments
(pseudohémophilie, sclérodermie, hydrolase, tympanectomie,
arthrodèse, synesthésie);
borrowings from Latin or English;
human anatomy (cloacal, pubovaginal, nasopharyngé, mitral,
antre, inguinal, strontium, érythème, maxillo-facial,
mésentère);
lab test results.

Acquisition of paraphrases
1 Contexte
4 Conclusion

Existing work: general language
Revision of Simple Wikipedia articles (Yatskar et al., 2010):
probabilistic models and ﬁlters
between 1,079 and 2,970 pairs:
{stands for, is the same as}, {indigenous, native}
precision: 17% to 86%;
Methods from machine translation (Zhu et al., 2010; Wubben
et al., 2012):
parallel and aligned corpora (Wikipedia/Simple Wikipedia)
Distributional methods (Glavas & Stajner, 2015; Kim et al.,
2016):
monolingual corpora
vectors can contain equivalents easier to understand
ﬁltering

Existing work: medical language
Automatic translator of medical terms to general language
(McCray et al., 1999):
MEDLINEplus (brochures)
Consumer Health Vocabulary (CHV) (Zeng & Tse, 2006)
collaborative approach
Morpho-syntactic variants (Deléger & Zweigenbaum, 2008;
Cartoni & Deléger, 2011):
{consommation régulière, consommer de fa¸con régulière}
{gêne à la lecture, empêche de lire}
Social media specificities (Tapi Nzali et al., 2015):
misspellings
{cirrhose, cyrose}, {métastase, metastase}
reduced words
{oncologue, onco}, {chimiothérapie, chimio}

Deﬁnitions (Antoine & Grabar, 2017)
Reformulations (Antoine & Grabar, 2017)
Morphological composition (Grabar & Hamon, 2014; Grabar
& Hamon, 2016)

Definitions
Methods
Definition: structure with two elements:
definiendum (term to define) and definiens (the definition)
Myocarde est le tissu musculaire du coeur
Use of four patterns (Péry-Woodley & Rebeyrolle, 1998)
désigne (means)
est un (is a)
est appelé (called as)
peut être défini comme (can be defined as)
...with inflectional variants
Trigger: term

Definitions
Results
Extraction:
2,037 definitions
1,286 unique terms
Evaluation:
strict precision: 52.5%
correct definitions: 849
weak precision: 68%
correct and possibly correct definitions: 1,028
Types of terms:
compound terms:
hypoglycémie, acidocétose, angiographie, hypokaliémie,
affixed terms:
curetage, capsulite, arthrose, glaucome, durillon, pré-diabète,
non-constructed terms:
cataracte, impétigo, zona

Definitions
Results
L’hypoglycémie est un manque de sucre dans l’organisme
Une septicémie est un empoisonnement du sang du à un
microbe
Le curetage est un nettoyage en profondeur d’une gencive
inflammée
Pour un être humain adulte, une hypoglycémie est une
glycémie inférieure à 0,8 g/L
Les signes classiques annonciateurs de l’hypoglycémie sont des
sueurs, pâleur, palpitations, fringales en particulier
L’impétigo est une infection cutanée, qui provoque des
pustules qui dégénèrent en croûtes jaunâtres, l’impétigo est
due à...

Definitions
Results
Readability (péricarde):
+ La couche extérieure du cœur est appelée péricarde.
∼ Le péricarde est un sac à double paroi contenant le cœur et les
racines des gros vaisseaux sanguins.
− Le péricarde est un organe de glissement, formé de deux
feuillets limitant une cavité virtuelle, la cavité péricardique, qui
permet les mouvements cardiaques.

Reformulations
Motivation
Reformulation: say diﬀerently (Le Bot et al., 2008)
Occurrence of reformulations:
indicates presence of diﬃcult words/terms
provides triggers for the extraction
Exploit reliable data:
health fora with moderators
Wikipedia

Reformulations
Methods
concept marker reformulation
vésiculaire, c’est-à-dire, venant de la vésicule biliaire
3 markers :
c’est-à-dire (I mean)
autrement dit ; Autrement dit (in other words)
encore appelé(e)(s) (also called)
Pre-processing
POS-tagging and syntactic analysis by Cordial (Laurent et al.,
2009)
Trigger: markers
Extraction of concept and of reformulation:
syntactic information
boundaries: syntagms or propositions

Reformulations
form lemma POS POSMT GS type GS Prop
Vous vous PPER2P Pp2.pn 1 S 1
ne ne ADV Rpn 3—1 S 1
devez devoir VINDP2P Vmip2p 3 V 1
pas pas ADV Rgn 3 Q 1
employer employer VINF Vmn – 5 D 2
de de PREP Sp 7 D 2
savons savon NCMP Ncmp 7 D 2
ou ou COO Cc 7 F 2
des de le DETDPIG Da-.p-i 10—7 F 2
laits lait NCMP Ncmp 10—7 F 2
sophistiqués sophistiqué ADJMP Afpmp 10—7 F 2
, , PCTFAIB Ypw - - 2
c’ ce PDS Pd-..- 13 N 2
est est ADV Rgp - p 2
-à à PREP Sp 16 F 2
-dire dire VINF Vmn– 16 F 2
contenant contenant NCMS Ncms 17 D 2
plusieurs plusieurs ADJIND Dt-.p- 19 D 2
composants composant NCMP Ncmp 19 D 2

Reformulations
de de PREP Sp 7 D 2
ou ou COO Cc 7 F 2
c’ ce PDS Pd-..- 13 N 2

Reformulations
Evaluation
Dev. Test P R F
nb occ. 96 2 757 exact 0.24 0.24 0.24
nb types 96 2 710 inexact 0.98 0.98 0.98
Difficulties:
detection of boundaries:
en c’est-à-dire au contact du sang circulant
une toxi-infection, c’est-à-dire, qu’ elle peut
semantics:
en 10 ans autrement dit sur 64 millions de personnes
un objectif c’est-à-dire une finalité

Reformulations
Results
des canaux galactophores c’est-à-dire sécrètent le lait
erratiques c’est-à-dire qu’ils changent de d’aspect et d’endroit
par une lithiase c’est-à-dire un caillou
clivage du moi c’est-à-dire comme une opposition entre le moi
et la réalité
au gré de la désintégration radioactive du 18 F c’est-à-dire
avec une demi-vie d’environ
un trouble de l’identité sexuelle c’est-à-dire qu’ils s’identifient
à un genre ne correspondant pas à leur sexe biologique
une enzyme protéolytique c’est-à-dire digère les protéines
comme le fait le suc pancréatique
celle de troubles fonctionnels intestinaux encore appelés
colopathie fonctionnelle

Morphological composition
Morphological
analysis of components
TranslationPOS−tagging
Medical
terms
Corpus
POS−tagging Syntactic
analysis
Evaluation
Alignment
Processing of terms
myocarde myocarde/Nom
[[[myo N*] [carde N*] NOM] ique ADJ]
myo=muscle, carde=coeur
Processing of corpus
Les causes de tachycardie ventriculaire sont superposables `a celles des
extrasystoles ventriculaires: infarctus du myocarde, insuﬃsance cardiaque,
hypertrophie du muscle du cœur et prolapsus de la valve mitrale.

Morphological
analysis of components
TranslationPOS−tagging
Medical
terms
Corpus
POS−tagging Syntactic
analysis
Evaluation
Alignment
Processing of terms
myocarde myocarde/Nom
[[[myo N*] [carde N*] NOM] ique ADJ]
myo=muscle, carde=coeur
Processing of corpus
Les causes de tachycardie ventriculaire sont superposables `a celles des
extrasystoles ventriculaires: infarctus du myocarde, insuﬃsance cardiaque,
[hypertrophie du [muscle du cœur]] et prolapsus de la valve mitrale.

Results
Alignment syntagm/term (percentage of alignment):
E1: full term and syntagm:
{myo pathie, maladie du muscle}
E2: full term, partial syntagm:
{myo pathie, maladie du muscle cardiaque}
E3: partial term, full syntagm:
{myopathie, la maladie}
E4: partial term and syntagm:
{myopathie, l’ origine de la maladie}

Evaluation
Nb of unigrams bigrams trigrams
b l s b l s b l s
correct paraphrases 549 785 644 378 517 461 195 290 257
poss. correct 39 32 67 22 45 75 10 19 41
processing of terms 47 60 44 28 28 46 9 10 26
incorrect paraphrases 33 146 296 64 80 380 25 39 148
Pstrict 82 77 61 77 77 48 82 81 55
Pweak 88 80 68 81 84 40 86 86 63
%incorrect 5 14 28 13 12 39 11 11 31
Evaluation:
strict precision 82 to 55%
weak precision 86 to 40%
error rate 5 to 39%
Resources
without: the best precision
morphology: good precision
synonymy: low precision

Morphological analysis
Ambigous analysis
[post [[uro N*] [graphie N*] NOM] NOM]
[[posturo N*] [graphie N*] NOM]
Incorrect analysis
sanglot: lot and sang
exotique: externe and oreille
divin: deux and vin (deux litres de vin)

Extraction of paraphrases and their evaluation
Correct paraphrases
raw
{podalgie, douleur du pied}
{mastite, inflammation du sein}
{cystoprostatectomie, ablation de la vessie et de la prostate}
Morphology
{desmorrhexie, rupture des ligaments} (ligament→ligaments)
{bronchite, inflammation des bronches/inflammation
bronchique} (bronche→bronches, bronche→bronchique)
{dentalgie, douleurs dentaires} (dents→dentaires)
Synonymy
{aclasie, absence de fracture} (cassure→fracture)
{enterectomie, résection des intestins} (ablation→résection)

Extraction of paraphrases and their evaluation
Semantic relations between components:
well managed by data from corpora
errors: coordination/subordination
hematospermie: le sang ou le sperme, instead of
→ le sang dans le sperme
Non-compositional terms:
ostéodermie: peau and os, instead of
→ une structure d’écailles, de plaques osseuses ou d’autres
compositions dans les couches dermiques de la peau, comme
chez les lézards ou dinosaures

Comparison with existing work
term type nb. para precision
(Zeng et al., 2006) all CHV
(Elhadad & Sutaria, 2007) all 152 0.58
(Deléger & Zweigenbaum, 2008) m-synt. 65, 82 0.67, 0.60
(Cartoni & Deléger, 2011) m-synt. 109 0.66
definitions all 1,028 0.52, 0.68
morphology compounds 1,128 0.76, 0.86
abbreviations abbr. 42, 8,106 0.74/0.94
reformulation all 96, 2,710 0.24/0.98
parentheses all 305, 92,971 0.23/0.68
morpho-syntactic:
{consommation régulière, consommer de fa¸con régulière}
comparable performance, better coverage

Comparison with existing work
D´eriF (Namer, 2003):
gloss in formal language for every analyzed word
our method: coverage depends on content of corpora
myocarde:
”(Partie de – Type particulier de) coeur en rapport avec le(s)
muscle”
muscle du coeur
desmorrhexie:
”rupture (du – li´ee au) ligament”
rupture des ligaments

Conclusion
in reading and understanding
Acquisition of resources
for explaining technical terms
Methods dedicated to diﬀerent kinds of linguistic phenomena
paraphrases, reformulations...
Exploitation of general language corpora
Complementary methods
Interesting and exploitable results
Work in French
Diagnosis
of text
Detection of
difficult words
Simplification
/decoration
difficult

Future work
Increase the coverage of paraphrases and reformulations:
more corpora
comparables (Cochrane, patient package inserts, Wiki/Viki)
monolingual
more suppletive resources
other methods for extracting the paraphrases
Alignment with medical terminologies
Distribution of the resource
Other languages
Lexical simpliﬁcation of medical texts
ANR project CLEAR (Communication, Literacy, Education,
Accessibility, Readability)
Diagnosis
of text
Detection of
difficult words
Simplification
/decoration
difficult

AMA (1999).
Health literacy: report of the council on scientific affairs. Ad hoc committee on
health literacy for the council on scientific affairs, American Medical Association.
JAMA, 281(6), 552–7.
Antoine, E. & Grabar, N. (2017).
Acquisition of expert/non-expert vocabulary from reformulations.
In MIE, Stud Health Technol Inform. 235, pp. 521–525.
Berland, G., Elliott, M., Morales, L., Algazy, J., Kravitz, R.,
Broder, M., Kanouse, D., Munoz, J., Puyol, J. & et al, M. L. (2001).
Health information on the internet. accessibility, quality, and readability in
english ans spanish.
JAMA, 285(20), 2612–2621.
Bertram, R., Kuperman, V., Baayen, H. R. & Hyönä, J. (2011).
The hyphen as a segmentation cue in triconstituent compound processing: It’s
getting better all the time.
Scandinavian Journal of Psychology, 52(6), 530–544.
Beyersmann, E., Coltheart, M. & Castles, A. (2012).
Parallel processing of whole words and morphemes in visual word recognition.
The Quarterly Journal of Experimental Psychology, 65(9), 1798–1819.
Bozic, M., Marslen-Wilson, W. D., Stamatakis, E. A., Davis, M. H. &
Tyler, L. K. (2007).
Differentiating morphology, form, and meaning: Neural correlates of
morphological complexity.

Journal of Cognitive Neuroscience, 19(9), 1464–1475.
Cartoni, B. & Deléger, L. (2011).
Découverte de patrons paraphrastiques en corpus comparable: une approche
basée sur les n-grammes.
In Traitement Automatique des Langues Naturelles (TALN).
Chmielik, J. & Grabar, N. (2009).
Comparative study between expert and non-expert biomedical writings: their
morphology and semantics.
Stud Health Technol Inform., 150, 359–63.
Chmielik, J. & Grabar, N. (2011).
Détection de la spécialisation scientifique et technique des documents
biomédicaux grâce aux informations morphologiques.
TAL, 51(2), 151–179.
Côté, R. A., Rothwell, D. J., Palotay, J. L., Beckett, R. S. &
Brochu, L. (1993).
The Systematised Nomenclature of Human and Veterinary Medicine: SNOMED
International.
Northfield: College of American Pathologists.
Deléger, L. & Zweigenbaum, P. (2008).
Paraphrase acquisition from comparable medical corpora of specialized and lay
texts.
In Ann Symp Am Med Inform Assoc (AMIA), pp. 146–50.
Dohmes, P., Zwitserlood, P. & Bölte, J. (2004).45/45 Automatic text simplification in biomedical domain Natalia Grabar

The impact of semantic transparency of morphologically complex words on
picture naming.
Brain and Language, 90(1-3), 203–212.
Elhadad, N. & Sutaria, K. (2007).
Mining a lexicon of technical terms and lay equivalents.
In BioNLP, pp. 49–56.
Flesch, R. (1948).
A new readability yardstick.
Journ Appl Psychol, 23, 221–233.
Frisson, S., Niswander-Klement, E. & Pollatsek, A. (2008).
The role of semantic transparency in the processing of english compound words.
Br J Psychol, 99(1), 87–107.
Glavas, G. & Stajner, S. (2015).
Simplifying lexical simplification: Do we need simplified corpora?
In ACL-COLING, pp. 63–68.
Goeuriot, L., Grabar, N. & Daille, B. (2007).
Caractérisation des discours scientifique et vulgarisé en fran¸cais, japonais et
russe.
In Traitement Automatique des Langues Naturelles (TALN), pp. 93–102.
Grabar, N., Farce, E. & Sparrow, L. (2018).
Étude de la lisibilité des documents de santé avec des méthodes d’oculométrie.
In Traitement Automatique des Langues Naturelles (TALN), pp. 1–14.

Grabar, N. & Hamon, T. (2014).
Automatic extraction of layman names for technical medical terms.
In ICHI 2014, Pavia, Italy.
Grabar, N. & Hamon, T. (2016).
Exploitation de la morphologie pour l’extraction automatique de paraphrases
grand public des termes médicaux.
TAL, 57(1), 85–109.
Grabar, N., Hamon, T. & Amiot, D. (2014).
Automatic diagnosis of understanding of medical words.
In EACL PITR Workshop, pp. 11–20.
Grabar, N., Krivine, S. & Jaulent, M. (2007).
Classification of health webpages as expert and non expert with a reduced set of
cross-language features.
Gunning, R. (1973).
The art of clear writing.
New York, NY: McGraw Hill.
Jarema, G., Busson, C., Nikolova, R., Tsapkini, K. & Libben, G. (1999).
Processing compounds: A cross-linguistic study.
Brain and Language, 68(1-2), 362–369.
Kim, Y.-S., Hullman, J., Burgess, M. & Adar, E. (2016).
Simplescience: Lexical simplification of scientific terminology.
In EMNLP, pp. 1–6.45/45 Automatic text simplification in biomedical domain Natalia Grabar

Koester, D. & Schiller, N. O. (2011).
The functional neuroanatomy of morphology in language production.
NeuroImage, 55(2), 732–741.
Kokkinakis, D. & Toporowska Gronostaj, M. (2006).
Comparing lay and professional language in cardiovascular disorders corpora.
In A. Pham T., James Cook University, Ed., WSEAS Transactions on
BIOLOGY and BIOMEDICINE, pp. 429–437.
Laurent, D., Nègre, S. & Séguéla, P. (2009).
L’analyseur syntaxique Cordial dans Passage.
In Traitement Automatique des Langues Naturelles (TALN).
Le Bot, M.-C., Schuwer, M. & Élisabeth Richard (dir.) (2008).
La reformulation : Marqueurs linguistiques – Stratégies énonciatives.
Rennes: Rivages linguistiques.
Leroy, G., Helmreich, S., Cowie, J., Miller, T. & Zheng, W. (2008).
Evaluating online health information: Beyond readability formulas.
Libben, G., Gibson, M., Yoon, Y. B. & Sandra, D. (2003).
Compound fracture: The role of semantic transparency and morphological
headedness.
Brain and Language, 84(1), 50–64.
Lüttmann, H., Zwitserlood, P. & Bölte, J. (2011).

Sharing morphemes without sharing meaning: Production and comprehension of
german verbs in the context of morphological relatives.
Canadian Journal of Experimental Psychology/Revue canadienne de psychologie
expérimentale, 65(3), 173–191.
McCray, A. (2005).
Promoting health literacy.
J of Am Med Infor Ass, 12, 152–163.
McCray, A., Loane, R., Browne, A. & Bangalore, A. (1999).
Terminology issues in user access to web-based medical information.
Namer, F. (2003).
Automatiser l’analyse morpho-sémantique non affixale: le système DériF.
Cahiers de Grammaire, 28, 31–48.
Patel, V., Branch, T. & Arocha, J. (2002).
Errors in interpreting quantities as procedures : The case of pharmaceutical
labels.
Int Journ Med Inform, 65(3), 193–211.
Péry-Woodley, M. & Rebeyrolle, J. (1998).
Domain and genre in sublanguage text: definitional microtexts in three corpora.
In LREC, pp. 987–992.
Poprat, M., Markó, K. & Hahn, U. (2006).
A language classifier that automatically divides medical documents for experts
and health care consumers.

In Int Congress of the European Federation for Medical Informatics, pp.
503–508, Maastricht.
Quinlan, J. (1993).
C4.5 Programs for Machine Learning.
San Mateo, CA: Morgan Kaufmann.
Specia, L., Jauhar, S. & Mihalcea, R. (2012).
Semeval-2012 task 1: English lexical simplification.
In *SEM 2012, pp. 347–355.
Tapi Nzali, M., Bringay, S., Lavergne, C., Opitz, T., Azé, J. &
Mollevi, C. (2015).
Construction d’un vocabulaire patient/médecin dédié au cancer du sein à partir
des médias sociaux.
In IC 2015.
Tran, T., Chekroud, H., Thiery, P. & Julienne, A. (2009).
Internet et soins : un tiers invisible dans la relation médecine/patient ?
Ethica Clinica, 53, 34–43.
Wang, Y. (2006).
Automatic recognition of text difficulty from consumers health information.
In IEEE, Ed., Computer-Based Medical Systems, pp. 131–136.
Williams, M., Parker, R., Baker, D., Parikh, N., Pitkin, K., Coates,
W. & Nurss, J. (1995).
Inadequate functional health literacy among patients at two public hospitals.
JAMA, 274(21), 1677–1682.

Wubben, S., van den Bosch, A. & Krahmer, E. (2012).
Sentence simplification by monolingual machine translation.
In Annual Meeting of the Association for Computational Linguistics, pp.
1015–1024.
Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C. & Lee, L. (2010).
For the sake of simplicity: Unsupervised extraction of lexical simplifications from
Wikipedia.
In NAACL, pp. 365–368.
Zeng, Q. & Tse, T. (2006).
Exploring and developing consumer health vocabularies.
JAMIA, 13, 24–29.
Zeng, Q. T., Tse, T., Divita, G., Keselman, A., Crowell, J. & Browne,
A. C. (2006).
Exploring lexical forms: first-generation consumer health vocabularies.
Zeng-Treiler, Q., Kim, H., Goryachev, S., Keselman, A., Slaugther, L.
& Smith, C. (2007).
Text characteristics of clinical reports and their implications for the readability of
personal health records.
In MEDINFO, pp. 1117–1121, Brisbane, Australia.
Zheng, W., Milios, E. & Watters, C. (2002).
Filtering for medical news items using a machine learning approach.

Zhu, Z., Bernhard, D. & Gurevych, I. (2010).
A monolingual tree-based translation model for sentence simpliﬁcation.
In COLING 2010, pp. 1353–1361.

Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical domain - Natalia Grabar

Recommended

Recommended

More Related Content

Similar to Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical domain - Natalia Grabar

Similar to Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical domain - Natalia Grabar (20)

More from Grammarly

More from Grammarly (14)

Recently uploaded

Recently uploaded (20)

Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical domain - Natalia Grabar