SlideShare a Scribd company logo
Context Difficulty Paraphrases Conclusion
Automatic text simplification in biomedical
domain
Natalia Grabar
STL CNRS UMR8163, France
Grammarly, Kyiv, Ukraine: 21/08/2018
1/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Background
Lviv University
Languages, Linguistics
2/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Background
Lviv University Master, PhD
INaLCO, Universit´e Paris 6
Languages, Linguistics NLP, Medical area, Terminology
2/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Background
Lviv University Master, PhD
INaLCO, Universit´e Paris 6
Languages, Linguistics NLP, Medical area, Terminology
PostDoc, AHU
Inserm, Fondation HON Geneva
Information retrieval, Quality of information
Discourse analysis, Typology
Information for non-specialized users
2/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Background
Lviv University Master, PhD
INaLCO, Universit´e Paris 6
Languages, Linguistics NLP, Medical area, Terminology
Acquisition of lexical resources
PostDoc, AHU Researcher
Inserm, Fondation HON Geneva CNRS
Information retrieval, Quality of information Information for non-specialized users
Discourse analysis, Typology Semantic annotation, Information extraction
Information for non-specialized users
2/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Automatic text simplification in biomedical domain
work in French
1 Context
2 Detection of difficulties
3 Acquisition of paraphrases
4 Conclusion
3/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Context
Evolution of the biomedical domain:
specific knowledge and terms
Different kinds of users:
medical staff, pharmacists, students, patients...
various levels of specialization
Patients: quality of information, understanding
technicity and understanding of health information
⇒ Close relation with health and well-being of people
(AMA, 1999; Berland et al., 2001; McCray, 2005; Tran et al.,
2009)
4/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Readability of health documents
Health information must be: readable, understandable, usable
In different situations:
follow up of treatments
make decisions (chronical disorders)
communicate with medical doctors
make the healthcare process successful
Real difficulty:
understand the steps of the correct intake of drugs (Patel
et al., 2002)
within 2,600 US patients (2 hospitals):
26% to 60% cannot understand instructions on drug intake,
informed consensus, health brochures (Williams et al., 1995)
Documents, health websites designed for patients:
often show high technicity (Berland et al., 2001)
5/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Objective
Make health documents and medical terms better
understandable by patients:
detect reading difficulties
propose common paraphrases for technical terms
Diagnosis
of text
modelref. ref. model res. rules
Detection of
difficult words
Simplification
/decoration
difficult
Text Simplified text
Interdisciplinary research:
linguistics, psychology, terminology, NLP...
6/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties
1 Context
2 Detection of difficulties
3 Acquisition of paraphrases
4 Conclusion
7/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties (documents)
Existing work
Text typology
Diagnosis of the text readability
Classical measures: Flesch (Flesch, 1948), Fog (Gunning,
1973)...
Computational measures:
classical measures and medical vocabulary (Kokkinakis &
Toporowska Gronostaj, 2006)
n-grams of characters (Poprat et al., 2006)
manual weighting of words (Zheng et al., 2002)
morphology (Chmielik & Grabar, 2009)
stylistic criteria (Grabar et al., 2007)
discursive criteria (Goeuriot et al., 2007)
various combinations (Wang, 2006; Zeng-Treiler et al., 2007;
Goeuriot et al., 2007; Leroy et al., 2008)
...
8/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties (documents)
Results (Chmielik & Grabar, 2009; Chmielik & Grabar, 2011)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
lexical features morphological features
Decision trees C4.5 (Quinlan, 1993)
10-fold cross-validation
9/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties (words)
Existing work
Facilitators: hiphen (Bertram et al., 2011), space (Frisson
et al., 2008), morphological closeness (L¨uttmann et al., 2011),
primes (Bozic et al., 2007; Beyersmann et al., 2012), pictures
(Dohmes et al., 2004; Koester & Schiller, 2011), etc.
Morphological head (Jarema et al., 1999; Libben et al., 2003)
NLP: challenges (Specia et al., 2012):
for a short text and a given word, several possible substitutions
which satisfy the context are proposed
→ sort the substitutions according to their simplicity
Descriptors:
Google n-grams, WordNet, length of words, number syllables,
mutual information, frequency...
10/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties
Psychology: eye-tracking (Grabar et al., 2018)
Eye-tracking:
recording eye movements when reading
Several indicators:
fixations: periods during which the eyes are stable (visual
information is analyzed)
saccades: rapid movements of eyes to move from one point to
another
regressions: backward movements
11/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties: Eye-tracking
text1
EXAMEN : ECHOGRAPHIE DES MAINS ET DES PIEDS
MOTIF : Bilan d’arthralgies
Mains : On ne visualise pas de t´enosynovite, ou d’arthrosynovite.
Avant-pieds : On retrouve des remaniements int´eressant les premi`eres
m´etatarsophalangiennes en rapport avec des ant´ec´edents de chirurgie d’Hallux
valgus.
Absence d’arthrosynovite au niveau des articulations m´etatarsophalangiennes.
EXAMEN : ECHOGRAPHIE DES MAINS ET DES PIEDS
MOTIF : Bilan de douleurs articulaires
Mains : On ne visualise pas d’inflammation des tendons, ni de la membrane
articulaire.
Avant-pieds : On retrouve des remaniements int´eressants sur les premi`eres
articulations des pieds en rapport avec les ant´ec´edents de la chirurgie de la
d´eformation du pied.
Absence d’inflammation de la membrane au niveau des articulations du pied.
12/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties: Eye-tracking
text2
Cette patiente avait constitu´e un infarctus du myocarde ant´erieur en novembre
2010, pour lequel avait ´et´e r´ealis´ee une angioplastie de l’IVA moyenne avec
implantation d’un stent non actif Vision de 2.75 mm x 18 mm, un compl´ement
par angioplastie au ballon seul en aval. Une endoproth`ese avait ´egalement ´et´e
implant´ee au niveau de la circonflexe proximale, avec un stent Vision 2.5 x 18
mm. La fraction d’´ejection ´etait ´evalu´ee entre 35 et 40 %.
Nous l’avions revue r´ecemment, en insuffisance cardiaque, avec plusieurs autres
probl`emes :
- une an´emie microcytaire inexpliqu´ee,
- un d´es´equilibre important de son diab`ete pour lequel elle a ´et´e, entre temps,
prise en charge par nos confr`eres diab´etologues.
Cette patiente avait pr´esent´e une crise cardiaque en novembre 2010, pour
laquelle avait ´et´e r´ealis´ee une intervention chirurgicale de l’art`ere cardiaque avec
implantation d’un stent non actif. Un autre stent avait ´egalement ´et´e implant´e
au niveau d’une autre art`ere. La fraction d’´ejection observ´ee ´etait basse.
Nous l’avions revue r´ecemment, en insuffisance cardiaque, avec plusieurs autres
probl`emes :
- une an´emie inexpliqu´ee,
- un d´es´equilibre important de son diab`ete pour lequel elle a ´et´e, entre temps,13/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties: Eye-tracking
Results on text1
14/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties: Eye-tracking
Results on text2
15/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties: Eye-tracking
Results
text1 text2
O S SD p ddl t-test O S SD p ddl t-test
TRN 60,55 63,63 -3,08 0,23 45,00 1,22 62,73 59,67 3,06 0,22 45,00 1,24
CRL 58,88 62,06 -3,19 0,22 45,00 1,25 61,04 57,84 3,20 0,21 45,00 1,29
DPF 227,41 215,75 11,66 0,11 45,00 1,65 214,73 214,69 0,04 0,50 45,00 0,68
NTF 587,61 370,48 217,14 0,00 45,00 7,38 395,71 372,22 23,49 0,16 45,00 1,43
AMP 3,50 3,80 -0,30 0,02 45,00 2,44 3,33 3,82 -0,49 0,00 45,00 5,38
REG 27,26 21,21 6,06 0,05 45,00 2,05 21,47 19,30 2,18 0,24 45,00 1,18
QCM 1304,35 869,57 434,78 0,02 21,00 2,08 602,77 538,95 63,82 0,00 21,00 2,08
TRN, CRL: stable reading
DPF: no anticipation
NTF, AMP, REG: better significance on text1
QCM: better understanding with simplified versions
16/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties: NLP
(Grabar et al., 2014)
Medical words from Snomed International (Cˆot´e et al., 1993)
29,641 lemmatized words
Manually annotated:
by 3 independent annotators:
categories:
1 I can understand
2 I am not sure
3 I cannot understand
inter-annotator agreement: Cohen’s Kappa 0.736
NLP task: supervised categorization
automatically reproduce the manual annotations: F=0.90
24 descriptors:
syntactic and morphological information, reference lexica,
frequency, length, initial and final substrings, readability
scores...
17/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties: NLP
18/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Detection of difficulties
Typology
abbreviations (OG, VG, PAPS, j, bat, cp);
proper names (Gougerot, Sj¨ogren, Bentall, Glasgow, Babinski,
Barthel, Cockcroft);
drug names;
neoclassical compounds - disorders, procedures, treatments
(pseudoh´emophilie, scl´erodermie, hydrolase, tympanectomie,
arthrod`ese, synesth´esie);
borrowings from Latin or English;
human anatomy (cloacal, pubovaginal, nasopharyng´e, mitral,
antre, inguinal, strontium, ´eryth`eme, maxillo-facial,
m´esent`ere);
lab test results.
19/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Acquisition of paraphrases
1 Contexte
2 Detection of difficulties
3 Acquisition of paraphrases
4 Conclusion
20/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Acquisition of paraphrases
Existing work: general language
Revision of Simple Wikipedia articles (Yatskar et al., 2010):
probabilistic models and filters
between 1,079 and 2,970 pairs:
{stands for, is the same as}, {indigenous, native}
precision: 17% to 86%;
Methods from machine translation (Zhu et al., 2010; Wubben
et al., 2012):
parallel and aligned corpora (Wikipedia/Simple Wikipedia)
Distributional methods (Glavas & Stajner, 2015; Kim et al.,
2016):
monolingual corpora
vectors can contain equivalents easier to understand
filtering
21/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Acquisition of paraphrases
Existing work: medical language
Automatic translator of medical terms to general language
(McCray et al., 1999):
MEDLINEplus (brochures)
Consumer Health Vocabulary (CHV) (Zeng & Tse, 2006)
collaborative approach
Morpho-syntactic variants (Del´eger & Zweigenbaum, 2008;
Cartoni & Del´eger, 2011):
{consommation r´eguli`ere, consommer de fa¸con r´eguli`ere}
{gˆene `a la lecture, empˆeche de lire}
Social media specificities (Tapi Nzali et al., 2015):
misspellings
{cirrhose, cyrose}, {m´etastase, metastase}
reduced words
{oncologue, onco}, {chimioth´erapie, chimio}
22/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Acquisition of paraphrases
Definitions (Antoine & Grabar, 2017)
Reformulations (Antoine & Grabar, 2017)
Morphological composition (Grabar & Hamon, 2014; Grabar
& Hamon, 2016)
23/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Definitions
Methods
Definition: structure with two elements:
definiendum (term to define) and definiens (the definition)
Myocarde est le tissu musculaire du coeur
Use of four patterns (P´ery-Woodley & Rebeyrolle, 1998)
d´esigne (means)
est un (is a)
est appel´e (called as)
peut ˆetre d´efini comme (can be defined as)
...with inflectional variants
Trigger: term
24/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Definitions
Results
Extraction:
2,037 definitions
1,286 unique terms
Evaluation:
strict precision: 52.5%
correct definitions: 849
weak precision: 68%
correct and possibly correct definitions: 1,028
Types of terms:
compound terms:
hypoglyc´emie, acidoc´etose, angiographie, hypokali´emie,
affixed terms:
curetage, capsulite, arthrose, glaucome, durillon, pr´e-diab`ete,
non-constructed terms:
cataracte, imp´etigo, zona
25/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Definitions
Results
L’hypoglyc´emie est un manque de sucre dans l’organisme
Une septic´emie est un empoisonnement du sang du `a un
microbe
Le curetage est un nettoyage en profondeur d’une gencive
inflamm´ee
Pour un ˆetre humain adulte, une hypoglyc´emie est une
glyc´emie inf´erieure `a 0,8 g/L
Les signes classiques annonciateurs de l’hypoglyc´emie sont des
sueurs, pˆaleur, palpitations, fringales en particulier
L’imp´etigo est une infection cutan´ee, qui provoque des
pustules qui d´eg´en`erent en croˆutes jaunˆatres, l’imp´etigo est
due `a...
26/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Definitions
Results
Readability (p´ericarde):
+ La couche ext´erieure du cœur est appel´ee p´ericarde.
∼ Le p´ericarde est un sac `a double paroi contenant le cœur et les
racines des gros vaisseaux sanguins.
− Le p´ericarde est un organe de glissement, form´e de deux
feuillets limitant une cavit´e virtuelle, la cavit´e p´ericardique, qui
permet les mouvements cardiaques.
27/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Reformulations
Motivation
Reformulation: say differently (Le Bot et al., 2008)
Occurrence of reformulations:
indicates presence of difficult words/terms
provides triggers for the extraction
Exploit reliable data:
health fora with moderators
Wikipedia
28/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Reformulations
Methods
concept marker reformulation
v´esiculaire, c’est-`a-dire, venant de la v´esicule biliaire
3 markers :
c’est-`a-dire (I mean)
autrement dit ; Autrement dit (in other words)
encore appel´e(e)(s) (also called)
Pre-processing
POS-tagging and syntactic analysis by Cordial (Laurent et al.,
2009)
Trigger: markers
Extraction of concept and of reformulation:
syntactic information
boundaries: syntagms or propositions
29/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Reformulations
form lemma POS POSMT GS type GS Prop
Vous vous PPER2P Pp2.pn 1 S 1
ne ne ADV Rpn 3—1 S 1
devez devoir VINDP2P Vmip2p 3 V 1
pas pas ADV Rgn 3 Q 1
employer employer VINF Vmn – 5 D 2
de de PREP Sp 7 D 2
savons savon NCMP Ncmp 7 D 2
ou ou COO Cc 7 F 2
des de le DETDPIG Da-.p-i 10—7 F 2
laits lait NCMP Ncmp 10—7 F 2
sophistiqu´es sophistiqu´e ADJMP Afpmp 10—7 F 2
, , PCTFAIB Ypw - - 2
c’ ce PDS Pd-..- 13 N 2
est est ADV Rgp - p 2
-`a `a PREP Sp 16 F 2
-dire dire VINF Vmn– 16 F 2
contenant contenant NCMS Ncms 17 D 2
plusieurs plusieurs ADJIND Dt-.p- 19 D 2
composants composant NCMP Ncmp 19 D 2
30/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Reformulations
form lemma POS POSMT GS type GS Prop
Vous vous PPER2P Pp2.pn 1 S 1
ne ne ADV Rpn 3—1 S 1
devez devoir VINDP2P Vmip2p 3 V 1
pas pas ADV Rgn 3 Q 1
employer employer VINF Vmn – 5 D 2
de de PREP Sp 7 D 2
savons savon NCMP Ncmp 7 D 2
ou ou COO Cc 7 F 2
des de le DETDPIG Da-.p-i 10—7 F 2
laits lait NCMP Ncmp 10—7 F 2
sophistiqu´es sophistiqu´e ADJMP Afpmp 10—7 F 2
, , PCTFAIB Ypw - - 2
c’ ce PDS Pd-..- 13 N 2
est est ADV Rgp - p 2
-`a `a PREP Sp 16 F 2
-dire dire VINF Vmn– 16 F 2
contenant contenant NCMS Ncms 17 D 2
plusieurs plusieurs ADJIND Dt-.p- 19 D 2
composants composant NCMP Ncmp 19 D 2
31/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Reformulations
form lemma POS POSMT GS type GS Prop
Vous vous PPER2P Pp2.pn 1 S 1
ne ne ADV Rpn 3—1 S 1
devez devoir VINDP2P Vmip2p 3 V 1
pas pas ADV Rgn 3 Q 1
employer employer VINF Vmn – 5 D 2
de de PREP Sp 7 D 2
savons savon NCMP Ncmp 7 D 2
ou ou COO Cc 7 F 2
des de le DETDPIG Da-.p-i 10—7 F 2
laits lait NCMP Ncmp 10—7 F 2
sophistiqu´es sophistiqu´e ADJMP Afpmp 10—7 F 2
, , PCTFAIB Ypw - - 2
c’ ce PDS Pd-..- 13 N 2
est est ADV Rgp - p 2
-`a `a PREP Sp 16 F 2
-dire dire VINF Vmn– 16 F 2
contenant contenant NCMS Ncms 17 D 2
plusieurs plusieurs ADJIND Dt-.p- 19 D 2
composants composant NCMP Ncmp 19 D 2
32/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Reformulations
Evaluation
Dev. Test P R F
nb occ. 96 2 757 exact 0.24 0.24 0.24
nb types 96 2 710 inexact 0.98 0.98 0.98
Difficulties:
detection of boundaries:
en c’est-`a-dire au contact du sang circulant
une toxi-infection, c’est-`a-dire, qu’ elle peut
semantics:
en 10 ans autrement dit sur 64 millions de personnes
un objectif c’est-`a-dire une finalit´e
33/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Reformulations
Results
des canaux galactophores c’est-`a-dire s´ecr`etent le lait
erratiques c’est-`a-dire qu’ils changent de d’aspect et d’endroit
par une lithiase c’est-`a-dire un caillou
clivage du moi c’est-`a-dire comme une opposition entre le moi
et la r´ealit´e
au gr´e de la d´esint´egration radioactive du 18 F c’est-`a-dire
avec une demi-vie d’environ
un trouble de l’identit´e sexuelle c’est-`a-dire qu’ils s’identifient
`a un genre ne correspondant pas `a leur sexe biologique
une enzyme prot´eolytique c’est-`a-dire dig`ere les prot´eines
comme le fait le suc pancr´eatique
celle de troubles fonctionnels intestinaux encore appel´es
colopathie fonctionnelle
34/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Morphological composition
Morphological
analysis of components
TranslationPOS−tagging
Medical
terms
Corpus
POS−tagging Syntactic
analysis
Evaluation
Alignment
Processing of terms
myocarde myocarde/Nom
[[[myo N*] [carde N*] NOM] ique ADJ]
myo=muscle, carde=coeur
Processing of corpus
Les causes de tachycardie ventriculaire sont superposables `a celles des
extrasystoles ventriculaires: infarctus du myocarde, insuffisance cardiaque,
hypertrophie du muscle du cœur et prolapsus de la valve mitrale.
35/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Morphological composition
Morphological
analysis of components
TranslationPOS−tagging
Medical
terms
Corpus
POS−tagging Syntactic
analysis
Evaluation
Alignment
Processing of terms
myocarde myocarde/Nom
[[[myo N*] [carde N*] NOM] ique ADJ]
myo=muscle, carde=coeur
Processing of corpus
Les causes de tachycardie ventriculaire sont superposables `a celles des
extrasystoles ventriculaires: infarctus du myocarde, insuffisance cardiaque,
[hypertrophie du [muscle du cœur]] et prolapsus de la valve mitrale.
36/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Morphological composition
Results
Alignment syntagm/term (percentage of alignment):
E1: full term and syntagm:
{myo pathie, maladie du muscle}
E2: full term, partial syntagm:
{myo pathie, maladie du muscle cardiaque}
E3: partial term, full syntagm:
{myopathie, la maladie}
E4: partial term and syntagm:
{myopathie, l’ origine de la maladie}
37/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Morphological composition
Evaluation
Nb of unigrams bigrams trigrams
b l s b l s b l s
correct paraphrases 549 785 644 378 517 461 195 290 257
poss. correct 39 32 67 22 45 75 10 19 41
processing of terms 47 60 44 28 28 46 9 10 26
incorrect paraphrases 33 146 296 64 80 380 25 39 148
Pstrict 82 77 61 77 77 48 82 81 55
Pweak 88 80 68 81 84 40 86 86 63
%incorrect 5 14 28 13 12 39 11 11 31
Evaluation:
strict precision 82 to 55%
weak precision 86 to 40%
error rate 5 to 39%
Resources
without: the best precision
morphology: good precision
synonymy: low precision
38/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Morphological composition
Morphological analysis
Ambigous analysis
[post [[uro N*] [graphie N*] NOM] NOM]
[[posturo N*] [graphie N*] NOM]
Incorrect analysis
sanglot: lot and sang
exotique: externe and oreille
divin: deux and vin (deux litres de vin)
39/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Morphological composition
Extraction of paraphrases and their evaluation
Correct paraphrases
raw
{podalgie, douleur du pied}
{mastite, inflammation du sein}
{cystoprostatectomie, ablation de la vessie et de la prostate}
Morphology
{desmorrhexie, rupture des ligaments} (ligament→ligaments)
{bronchite, inflammation des bronches/inflammation
bronchique} (bronche→bronches, bronche→bronchique)
{dentalgie, douleurs dentaires} (dents→dentaires)
Synonymy
{aclasie, absence de fracture} (cassure→fracture)
{enterectomie, r´esection des intestins} (ablation→r´esection)
40/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Morphological composition
Extraction of paraphrases and their evaluation
Semantic relations between components:
well managed by data from corpora
errors: coordination/subordination
hematospermie: le sang ou le sperme, instead of
→ le sang dans le sperme
Non-compositional terms:
ost´eodermie: peau and os, instead of
→ une structure d’´ecailles, de plaques osseuses ou d’autres
compositions dans les couches dermiques de la peau, comme
chez les l´ezards ou dinosaures
41/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Comparison with existing work
term type nb. para precision
(Zeng et al., 2006) all CHV
(Elhadad & Sutaria, 2007) all 152 0.58
(Del´eger & Zweigenbaum, 2008) m-synt. 65, 82 0.67, 0.60
(Cartoni & Del´eger, 2011) m-synt. 109 0.66
definitions all 1,028 0.52, 0.68
morphology compounds 1,128 0.76, 0.86
abbreviations abbr. 42, 8,106 0.74/0.94
reformulation all 96, 2,710 0.24/0.98
parentheses all 305, 92,971 0.23/0.68
morpho-syntactic:
{consommation r´eguli`ere, consommer de fa¸con r´eguli`ere}
comparable performance, better coverage
42/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Comparison with existing work
D´eriF (Namer, 2003):
gloss in formal language for every analyzed word
our method: coverage depends on content of corpora
myocarde:
”(Partie de – Type particulier de) coeur en rapport avec le(s)
muscle”
muscle du coeur
desmorrhexie:
”rupture (du – li´ee au) ligament”
rupture des ligaments
43/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Conclusion
Detection of difficulties
in reading and understanding
Acquisition of resources
for explaining technical terms
Methods dedicated to different kinds of linguistic phenomena
paraphrases, reformulations...
Exploitation of general language corpora
Complementary methods
Interesting and exploitable results
Work in French
Diagnosis
of text
modelref. ref. model res. rules
Detection of
difficult words
Simplification
/decoration
difficult
Text Simplified text
44/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Future work
Increase the coverage of paraphrases and reformulations:
more corpora
comparables (Cochrane, patient package inserts, Wiki/Viki)
monolingual
more suppletive resources
other methods for extracting the paraphrases
Alignment with medical terminologies
Distribution of the resource
Other languages
Lexical simplification of medical texts
ANR project CLEAR (Communication, Literacy, Education,
Accessibility, Readability)
Diagnosis
of text
modelref. ref. model res. rules
Detection of
difficult words
Simplification
/decoration
difficult
Text Simplified text
45/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
AMA (1999).
Health literacy: report of the council on scientific affairs. Ad hoc committee on
health literacy for the council on scientific affairs, American Medical Association.
JAMA, 281(6), 552–7.
Antoine, E. & Grabar, N. (2017).
Acquisition of expert/non-expert vocabulary from reformulations.
In MIE, Stud Health Technol Inform. 235, pp. 521–525.
Berland, G., Elliott, M., Morales, L., Algazy, J., Kravitz, R.,
Broder, M., Kanouse, D., Munoz, J., Puyol, J. & et al, M. L. (2001).
Health information on the internet. accessibility, quality, and readability in
english ans spanish.
JAMA, 285(20), 2612–2621.
Bertram, R., Kuperman, V., Baayen, H. R. & Hy¨on¨a, J. (2011).
The hyphen as a segmentation cue in triconstituent compound processing: It’s
getting better all the time.
Scandinavian Journal of Psychology, 52(6), 530–544.
Beyersmann, E., Coltheart, M. & Castles, A. (2012).
Parallel processing of whole words and morphemes in visual word recognition.
The Quarterly Journal of Experimental Psychology, 65(9), 1798–1819.
Bozic, M., Marslen-Wilson, W. D., Stamatakis, E. A., Davis, M. H. &
Tyler, L. K. (2007).
Differentiating morphology, form, and meaning: Neural correlates of
morphological complexity.
45/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Journal of Cognitive Neuroscience, 19(9), 1464–1475.
Cartoni, B. & Del´eger, L. (2011).
D´ecouverte de patrons paraphrastiques en corpus comparable: une approche
bas´ee sur les n-grammes.
In Traitement Automatique des Langues Naturelles (TALN).
Chmielik, J. & Grabar, N. (2009).
Comparative study between expert and non-expert biomedical writings: their
morphology and semantics.
Stud Health Technol Inform., 150, 359–63.
Chmielik, J. & Grabar, N. (2011).
D´etection de la sp´ecialisation scientifique et technique des documents
biom´edicaux grˆace aux informations morphologiques.
TAL, 51(2), 151–179.
Cˆot´e, R. A., Rothwell, D. J., Palotay, J. L., Beckett, R. S. &
Brochu, L. (1993).
The Systematised Nomenclature of Human and Veterinary Medicine: SNOMED
International.
Northfield: College of American Pathologists.
Del´eger, L. & Zweigenbaum, P. (2008).
Paraphrase acquisition from comparable medical corpora of specialized and lay
texts.
In Ann Symp Am Med Inform Assoc (AMIA), pp. 146–50.
Dohmes, P., Zwitserlood, P. & B¨olte, J. (2004).45/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
The impact of semantic transparency of morphologically complex words on
picture naming.
Brain and Language, 90(1-3), 203–212.
Elhadad, N. & Sutaria, K. (2007).
Mining a lexicon of technical terms and lay equivalents.
In BioNLP, pp. 49–56.
Flesch, R. (1948).
A new readability yardstick.
Journ Appl Psychol, 23, 221–233.
Frisson, S., Niswander-Klement, E. & Pollatsek, A. (2008).
The role of semantic transparency in the processing of english compound words.
Br J Psychol, 99(1), 87–107.
Glavas, G. & Stajner, S. (2015).
Simplifying lexical simplification: Do we need simplified corpora?
In ACL-COLING, pp. 63–68.
Goeuriot, L., Grabar, N. & Daille, B. (2007).
Caract´erisation des discours scientifique et vulgaris´e en fran¸cais, japonais et
russe.
In Traitement Automatique des Langues Naturelles (TALN), pp. 93–102.
Grabar, N., Farce, E. & Sparrow, L. (2018).
´Etude de la lisibilit´e des documents de sant´e avec des m´ethodes d’oculom´etrie.
In Traitement Automatique des Langues Naturelles (TALN), pp. 1–14.
45/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Grabar, N. & Hamon, T. (2014).
Automatic extraction of layman names for technical medical terms.
In ICHI 2014, Pavia, Italy.
Grabar, N. & Hamon, T. (2016).
Exploitation de la morphologie pour l’extraction automatique de paraphrases
grand public des termes m´edicaux.
TAL, 57(1), 85–109.
Grabar, N., Hamon, T. & Amiot, D. (2014).
Automatic diagnosis of understanding of medical words.
In EACL PITR Workshop, pp. 11–20.
Grabar, N., Krivine, S. & Jaulent, M. (2007).
Classification of health webpages as expert and non expert with a reduced set of
cross-language features.
In Ann Symp Am Med Inform Assoc (AMIA), pp. 284–288.
Gunning, R. (1973).
The art of clear writing.
New York, NY: McGraw Hill.
Jarema, G., Busson, C., Nikolova, R., Tsapkini, K. & Libben, G. (1999).
Processing compounds: A cross-linguistic study.
Brain and Language, 68(1-2), 362–369.
Kim, Y.-S., Hullman, J., Burgess, M. & Adar, E. (2016).
Simplescience: Lexical simplification of scientific terminology.
In EMNLP, pp. 1–6.45/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Koester, D. & Schiller, N. O. (2011).
The functional neuroanatomy of morphology in language production.
NeuroImage, 55(2), 732–741.
Kokkinakis, D. & Toporowska Gronostaj, M. (2006).
Comparing lay and professional language in cardiovascular disorders corpora.
In A. Pham T., James Cook University, Ed., WSEAS Transactions on
BIOLOGY and BIOMEDICINE, pp. 429–437.
Laurent, D., N`egre, S. & S´egu´ela, P. (2009).
L’analyseur syntaxique Cordial dans Passage.
In Traitement Automatique des Langues Naturelles (TALN).
Le Bot, M.-C., Schuwer, M. & ´Elisabeth Richard (dir.) (2008).
La reformulation : Marqueurs linguistiques – Strat´egies ´enonciatives.
Rennes: Rivages linguistiques.
Leroy, G., Helmreich, S., Cowie, J., Miller, T. & Zheng, W. (2008).
Evaluating online health information: Beyond readability formulas.
In Ann Symp Am Med Inform Assoc (AMIA), pp. 394–8.
Libben, G., Gibson, M., Yoon, Y. B. & Sandra, D. (2003).
Compound fracture: The role of semantic transparency and morphological
headedness.
Brain and Language, 84(1), 50–64.
L¨uttmann, H., Zwitserlood, P. & B¨olte, J. (2011).
45/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Sharing morphemes without sharing meaning: Production and comprehension of
german verbs in the context of morphological relatives.
Canadian Journal of Experimental Psychology/Revue canadienne de psychologie
exp´erimentale, 65(3), 173–191.
McCray, A. (2005).
Promoting health literacy.
J of Am Med Infor Ass, 12, 152–163.
McCray, A., Loane, R., Browne, A. & Bangalore, A. (1999).
Terminology issues in user access to web-based medical information.
In Ann Symp Am Med Inform Assoc (AMIA), pp. 107–7.
Namer, F. (2003).
Automatiser l’analyse morpho-s´emantique non affixale: le syst`eme D´eriF.
Cahiers de Grammaire, 28, 31–48.
Patel, V., Branch, T. & Arocha, J. (2002).
Errors in interpreting quantities as procedures : The case of pharmaceutical
labels.
Int Journ Med Inform, 65(3), 193–211.
P´ery-Woodley, M. & Rebeyrolle, J. (1998).
Domain and genre in sublanguage text: definitional microtexts in three corpora.
In LREC, pp. 987–992.
Poprat, M., Mark´o, K. & Hahn, U. (2006).
A language classifier that automatically divides medical documents for experts
and health care consumers.
45/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
In Int Congress of the European Federation for Medical Informatics, pp.
503–508, Maastricht.
Quinlan, J. (1993).
C4.5 Programs for Machine Learning.
San Mateo, CA: Morgan Kaufmann.
Specia, L., Jauhar, S. & Mihalcea, R. (2012).
Semeval-2012 task 1: English lexical simplification.
In *SEM 2012, pp. 347–355.
Tapi Nzali, M., Bringay, S., Lavergne, C., Opitz, T., Az´e, J. &
Mollevi, C. (2015).
Construction d’un vocabulaire patient/m´edecin d´edi´e au cancer du sein `a partir
des m´edias sociaux.
In IC 2015.
Tran, T., Chekroud, H., Thiery, P. & Julienne, A. (2009).
Internet et soins : un tiers invisible dans la relation m´edecine/patient ?
Ethica Clinica, 53, 34–43.
Wang, Y. (2006).
Automatic recognition of text difficulty from consumers health information.
In IEEE, Ed., Computer-Based Medical Systems, pp. 131–136.
Williams, M., Parker, R., Baker, D., Parikh, N., Pitkin, K., Coates,
W. & Nurss, J. (1995).
Inadequate functional health literacy among patients at two public hospitals.
JAMA, 274(21), 1677–1682.
45/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Wubben, S., van den Bosch, A. & Krahmer, E. (2012).
Sentence simplification by monolingual machine translation.
In Annual Meeting of the Association for Computational Linguistics, pp.
1015–1024.
Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C. & Lee, L. (2010).
For the sake of simplicity: Unsupervised extraction of lexical simplifications from
Wikipedia.
In NAACL, pp. 365–368.
Zeng, Q. & Tse, T. (2006).
Exploring and developing consumer health vocabularies.
JAMIA, 13, 24–29.
Zeng, Q. T., Tse, T., Divita, G., Keselman, A., Crowell, J. & Browne,
A. C. (2006).
Exploring lexical forms: first-generation consumer health vocabularies.
In Ann Symp Am Med Inform Assoc (AMIA), pp. 1155–1155.
Zeng-Treiler, Q., Kim, H., Goryachev, S., Keselman, A., Slaugther, L.
& Smith, C. (2007).
Text characteristics of clinical reports and their implications for the readability of
personal health records.
In MEDINFO, pp. 1117–1121, Brisbane, Australia.
Zheng, W., Milios, E. & Watters, C. (2002).
Filtering for medical news items using a machine learning approach.
In Ann Symp Am Med Inform Assoc (AMIA), pp. 949–53.
45/45 Automatic text simplification in biomedical domain Natalia Grabar
Context Difficulty Paraphrases Conclusion
Zhu, Z., Bernhard, D. & Gurevych, I. (2010).
A monolingual tree-based translation model for sentence simplification.
In COLING 2010, pp. 1353–1361.
45/45 Automatic text simplification in biomedical domain Natalia Grabar

More Related Content

Similar to Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical domain - Natalia Grabar

Broadening the Scope of Nanopublications
Broadening the Scope of NanopublicationsBroadening the Scope of Nanopublications
Broadening the Scope of Nanopublications
Tobias Kuhn
 
The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology
mhaendel
 
Ontology based support for brain tumour study
Ontology based support for brain tumour study Ontology based support for brain tumour study
Ontology based support for brain tumour study Subhashis Das
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
David Talby
 
The Clinical Genome Conference 2014
The Clinical Genome Conference 2014The Clinical Genome Conference 2014
The Clinical Genome Conference 2014
Nicole Proulx
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
Servio Fernando Lima Reina
 
Cephalometrics history, evolution, and land marks/orthodontic courses by indi...
Cephalometrics history, evolution, and land marks/orthodontic courses by indi...Cephalometrics history, evolution, and land marks/orthodontic courses by indi...
Cephalometrics history, evolution, and land marks/orthodontic courses by indi...
Indian dental academy
 
Cephalometrics history, evolution, and land marks
Cephalometrics history, evolution, and land marksCephalometrics history, evolution, and land marks
Cephalometrics history, evolution, and land marks
Indian dental academy
 
Flacs vs mcs
Flacs vs mcsFlacs vs mcs
Flacs vs mcs
Dr Rakhi Dcruz
 
Tweeting beyond Facts – The Need for a Linguistic Perspective
Tweeting beyond Facts – The Need for a Linguistic PerspectiveTweeting beyond Facts – The Need for a Linguistic Perspective
Tweeting beyond Facts – The Need for a Linguistic Perspective
Data Science Society
 
Convolutional Neural Network to Model Articulation Impairments in Patients wi...
Convolutional Neural Network to Model Articulation Impairments in Patients wi...Convolutional Neural Network to Model Articulation Impairments in Patients wi...
Convolutional Neural Network to Model Articulation Impairments in Patients wi...
Juan Camilo Vasquez
 
ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...
ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...
ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...
ijaia
 
Workshops escrita modulos_3_4
Workshops escrita  modulos_3_4Workshops escrita  modulos_3_4
Workshops escrita modulos_3_4
Simone Miranda
 
Workshops escrita modulos_3_4
Workshops escrita  modulos_3_4Workshops escrita  modulos_3_4
Workshops escrita modulos_3_4
erivaldoerbo
 
Biomedical Entity Linking - Introduction, approaches, challenges
Biomedical Entity Linking - Introduction, approaches, challengesBiomedical Entity Linking - Introduction, approaches, challenges
Biomedical Entity Linking - Introduction, approaches, challenges
Anja Pilz
 
X-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage toolX-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage tool
Genomika Diagnósticos
 
Cenk Demiroglu - Analysis of Prosodic Patterns in Conversational Speech in Pe...
Cenk Demiroglu - Analysis of Prosodic Patterns in Conversational Speech in Pe...Cenk Demiroglu - Analysis of Prosodic Patterns in Conversational Speech in Pe...
Cenk Demiroglu - Analysis of Prosodic Patterns in Conversational Speech in Pe...
WTHS
 
NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020
Rui Zhang
 
DEEP FACIAL DIAGNOSIS: DEEP TRANSFER LEARNING FROM FACE RECOGNITION TO FACIAL...
DEEP FACIAL DIAGNOSIS: DEEP TRANSFER LEARNING FROM FACE RECOGNITION TO FACIAL...DEEP FACIAL DIAGNOSIS: DEEP TRANSFER LEARNING FROM FACE RECOGNITION TO FACIAL...
DEEP FACIAL DIAGNOSIS: DEEP TRANSFER LEARNING FROM FACE RECOGNITION TO FACIAL...
IRJET Journal
 

Similar to Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical domain - Natalia Grabar (20)

Broadening the Scope of Nanopublications
Broadening the Scope of NanopublicationsBroadening the Scope of Nanopublications
Broadening the Scope of Nanopublications
 
The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology
 
Ontology based support for brain tumour study
Ontology based support for brain tumour study Ontology based support for brain tumour study
Ontology based support for brain tumour study
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
The Clinical Genome Conference 2014
The Clinical Genome Conference 2014The Clinical Genome Conference 2014
The Clinical Genome Conference 2014
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
 
Cephalometrics history, evolution, and land marks/orthodontic courses by indi...
Cephalometrics history, evolution, and land marks/orthodontic courses by indi...Cephalometrics history, evolution, and land marks/orthodontic courses by indi...
Cephalometrics history, evolution, and land marks/orthodontic courses by indi...
 
Cephalometrics history, evolution, and land marks
Cephalometrics history, evolution, and land marksCephalometrics history, evolution, and land marks
Cephalometrics history, evolution, and land marks
 
Flacs vs mcs
Flacs vs mcsFlacs vs mcs
Flacs vs mcs
 
Tweeting beyond Facts – The Need for a Linguistic Perspective
Tweeting beyond Facts – The Need for a Linguistic PerspectiveTweeting beyond Facts – The Need for a Linguistic Perspective
Tweeting beyond Facts – The Need for a Linguistic Perspective
 
Convolutional Neural Network to Model Articulation Impairments in Patients wi...
Convolutional Neural Network to Model Articulation Impairments in Patients wi...Convolutional Neural Network to Model Articulation Impairments in Patients wi...
Convolutional Neural Network to Model Articulation Impairments in Patients wi...
 
ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...
ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...
ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...
 
Workshops escrita modulos_3_4
Workshops escrita  modulos_3_4Workshops escrita  modulos_3_4
Workshops escrita modulos_3_4
 
Workshops escrita modulos_3_4
Workshops escrita  modulos_3_4Workshops escrita  modulos_3_4
Workshops escrita modulos_3_4
 
Biomedical Entity Linking - Introduction, approaches, challenges
Biomedical Entity Linking - Introduction, approaches, challengesBiomedical Entity Linking - Introduction, approaches, challenges
Biomedical Entity Linking - Introduction, approaches, challenges
 
X-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage toolX-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage tool
 
Cenk Demiroglu - Analysis of Prosodic Patterns in Conversational Speech in Pe...
Cenk Demiroglu - Analysis of Prosodic Patterns in Conversational Speech in Pe...Cenk Demiroglu - Analysis of Prosodic Patterns in Conversational Speech in Pe...
Cenk Demiroglu - Analysis of Prosodic Patterns in Conversational Speech in Pe...
 
NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020
 
DEEP FACIAL DIAGNOSIS: DEEP TRANSFER LEARNING FROM FACE RECOGNITION TO FACIAL...
DEEP FACIAL DIAGNOSIS: DEEP TRANSFER LEARNING FROM FACE RECOGNITION TO FACIAL...DEEP FACIAL DIAGNOSIS: DEEP TRANSFER LEARNING FROM FACE RECOGNITION TO FACIAL...
DEEP FACIAL DIAGNOSIS: DEEP TRANSFER LEARNING FROM FACE RECOGNITION TO FACIAL...
 
Neuro poster 48x48
Neuro poster 48x48Neuro poster 48x48
Neuro poster 48x48
 

More from Grammarly

Vitalii Braslavskyi - Declarative engineering
Vitalii Braslavskyi - Declarative engineering Vitalii Braslavskyi - Declarative engineering
Vitalii Braslavskyi - Declarative engineering
Grammarly
 
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly
 
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly
 
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly
 
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly
 
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly
 
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly
 
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry Hamon
Grammarly
 

More from Grammarly (14)

Vitalii Braslavskyi - Declarative engineering
Vitalii Braslavskyi - Declarative engineering Vitalii Braslavskyi - Declarative engineering
Vitalii Braslavskyi - Declarative engineering
 
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
 
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
 
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
 
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
 
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
 
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
 
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry Hamon
 

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 

Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical domain - Natalia Grabar

  • 1. Context Difficulty Paraphrases Conclusion Automatic text simplification in biomedical domain Natalia Grabar STL CNRS UMR8163, France Grammarly, Kyiv, Ukraine: 21/08/2018 1/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 2. Context Difficulty Paraphrases Conclusion Background Lviv University Languages, Linguistics 2/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 3. Context Difficulty Paraphrases Conclusion Background Lviv University Master, PhD INaLCO, Universit´e Paris 6 Languages, Linguistics NLP, Medical area, Terminology 2/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 4. Context Difficulty Paraphrases Conclusion Background Lviv University Master, PhD INaLCO, Universit´e Paris 6 Languages, Linguistics NLP, Medical area, Terminology PostDoc, AHU Inserm, Fondation HON Geneva Information retrieval, Quality of information Discourse analysis, Typology Information for non-specialized users 2/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 5. Context Difficulty Paraphrases Conclusion Background Lviv University Master, PhD INaLCO, Universit´e Paris 6 Languages, Linguistics NLP, Medical area, Terminology Acquisition of lexical resources PostDoc, AHU Researcher Inserm, Fondation HON Geneva CNRS Information retrieval, Quality of information Information for non-specialized users Discourse analysis, Typology Semantic annotation, Information extraction Information for non-specialized users 2/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 6. Context Difficulty Paraphrases Conclusion Automatic text simplification in biomedical domain work in French 1 Context 2 Detection of difficulties 3 Acquisition of paraphrases 4 Conclusion 3/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 7. Context Difficulty Paraphrases Conclusion Context Evolution of the biomedical domain: specific knowledge and terms Different kinds of users: medical staff, pharmacists, students, patients... various levels of specialization Patients: quality of information, understanding technicity and understanding of health information ⇒ Close relation with health and well-being of people (AMA, 1999; Berland et al., 2001; McCray, 2005; Tran et al., 2009) 4/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 8. Context Difficulty Paraphrases Conclusion Readability of health documents Health information must be: readable, understandable, usable In different situations: follow up of treatments make decisions (chronical disorders) communicate with medical doctors make the healthcare process successful Real difficulty: understand the steps of the correct intake of drugs (Patel et al., 2002) within 2,600 US patients (2 hospitals): 26% to 60% cannot understand instructions on drug intake, informed consensus, health brochures (Williams et al., 1995) Documents, health websites designed for patients: often show high technicity (Berland et al., 2001) 5/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 9. Context Difficulty Paraphrases Conclusion Objective Make health documents and medical terms better understandable by patients: detect reading difficulties propose common paraphrases for technical terms Diagnosis of text modelref. ref. model res. rules Detection of difficult words Simplification /decoration difficult Text Simplified text Interdisciplinary research: linguistics, psychology, terminology, NLP... 6/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 10. Context Difficulty Paraphrases Conclusion Detection of difficulties 1 Context 2 Detection of difficulties 3 Acquisition of paraphrases 4 Conclusion 7/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 11. Context Difficulty Paraphrases Conclusion Detection of difficulties (documents) Existing work Text typology Diagnosis of the text readability Classical measures: Flesch (Flesch, 1948), Fog (Gunning, 1973)... Computational measures: classical measures and medical vocabulary (Kokkinakis & Toporowska Gronostaj, 2006) n-grams of characters (Poprat et al., 2006) manual weighting of words (Zheng et al., 2002) morphology (Chmielik & Grabar, 2009) stylistic criteria (Grabar et al., 2007) discursive criteria (Goeuriot et al., 2007) various combinations (Wang, 2006; Zeng-Treiler et al., 2007; Goeuriot et al., 2007; Leroy et al., 2008) ... 8/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 12. Context Difficulty Paraphrases Conclusion Detection of difficulties (documents) Results (Chmielik & Grabar, 2009; Chmielik & Grabar, 2011) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 lexical features morphological features Decision trees C4.5 (Quinlan, 1993) 10-fold cross-validation 9/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 13. Context Difficulty Paraphrases Conclusion Detection of difficulties (words) Existing work Facilitators: hiphen (Bertram et al., 2011), space (Frisson et al., 2008), morphological closeness (L¨uttmann et al., 2011), primes (Bozic et al., 2007; Beyersmann et al., 2012), pictures (Dohmes et al., 2004; Koester & Schiller, 2011), etc. Morphological head (Jarema et al., 1999; Libben et al., 2003) NLP: challenges (Specia et al., 2012): for a short text and a given word, several possible substitutions which satisfy the context are proposed → sort the substitutions according to their simplicity Descriptors: Google n-grams, WordNet, length of words, number syllables, mutual information, frequency... 10/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 14. Context Difficulty Paraphrases Conclusion Detection of difficulties Psychology: eye-tracking (Grabar et al., 2018) Eye-tracking: recording eye movements when reading Several indicators: fixations: periods during which the eyes are stable (visual information is analyzed) saccades: rapid movements of eyes to move from one point to another regressions: backward movements 11/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 15. Context Difficulty Paraphrases Conclusion Detection of difficulties: Eye-tracking text1 EXAMEN : ECHOGRAPHIE DES MAINS ET DES PIEDS MOTIF : Bilan d’arthralgies Mains : On ne visualise pas de t´enosynovite, ou d’arthrosynovite. Avant-pieds : On retrouve des remaniements int´eressant les premi`eres m´etatarsophalangiennes en rapport avec des ant´ec´edents de chirurgie d’Hallux valgus. Absence d’arthrosynovite au niveau des articulations m´etatarsophalangiennes. EXAMEN : ECHOGRAPHIE DES MAINS ET DES PIEDS MOTIF : Bilan de douleurs articulaires Mains : On ne visualise pas d’inflammation des tendons, ni de la membrane articulaire. Avant-pieds : On retrouve des remaniements int´eressants sur les premi`eres articulations des pieds en rapport avec les ant´ec´edents de la chirurgie de la d´eformation du pied. Absence d’inflammation de la membrane au niveau des articulations du pied. 12/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 16. Context Difficulty Paraphrases Conclusion Detection of difficulties: Eye-tracking text2 Cette patiente avait constitu´e un infarctus du myocarde ant´erieur en novembre 2010, pour lequel avait ´et´e r´ealis´ee une angioplastie de l’IVA moyenne avec implantation d’un stent non actif Vision de 2.75 mm x 18 mm, un compl´ement par angioplastie au ballon seul en aval. Une endoproth`ese avait ´egalement ´et´e implant´ee au niveau de la circonflexe proximale, avec un stent Vision 2.5 x 18 mm. La fraction d’´ejection ´etait ´evalu´ee entre 35 et 40 %. Nous l’avions revue r´ecemment, en insuffisance cardiaque, avec plusieurs autres probl`emes : - une an´emie microcytaire inexpliqu´ee, - un d´es´equilibre important de son diab`ete pour lequel elle a ´et´e, entre temps, prise en charge par nos confr`eres diab´etologues. Cette patiente avait pr´esent´e une crise cardiaque en novembre 2010, pour laquelle avait ´et´e r´ealis´ee une intervention chirurgicale de l’art`ere cardiaque avec implantation d’un stent non actif. Un autre stent avait ´egalement ´et´e implant´e au niveau d’une autre art`ere. La fraction d’´ejection observ´ee ´etait basse. Nous l’avions revue r´ecemment, en insuffisance cardiaque, avec plusieurs autres probl`emes : - une an´emie inexpliqu´ee, - un d´es´equilibre important de son diab`ete pour lequel elle a ´et´e, entre temps,13/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 17. Context Difficulty Paraphrases Conclusion Detection of difficulties: Eye-tracking Results on text1 14/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 18. Context Difficulty Paraphrases Conclusion Detection of difficulties: Eye-tracking Results on text2 15/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 19. Context Difficulty Paraphrases Conclusion Detection of difficulties: Eye-tracking Results text1 text2 O S SD p ddl t-test O S SD p ddl t-test TRN 60,55 63,63 -3,08 0,23 45,00 1,22 62,73 59,67 3,06 0,22 45,00 1,24 CRL 58,88 62,06 -3,19 0,22 45,00 1,25 61,04 57,84 3,20 0,21 45,00 1,29 DPF 227,41 215,75 11,66 0,11 45,00 1,65 214,73 214,69 0,04 0,50 45,00 0,68 NTF 587,61 370,48 217,14 0,00 45,00 7,38 395,71 372,22 23,49 0,16 45,00 1,43 AMP 3,50 3,80 -0,30 0,02 45,00 2,44 3,33 3,82 -0,49 0,00 45,00 5,38 REG 27,26 21,21 6,06 0,05 45,00 2,05 21,47 19,30 2,18 0,24 45,00 1,18 QCM 1304,35 869,57 434,78 0,02 21,00 2,08 602,77 538,95 63,82 0,00 21,00 2,08 TRN, CRL: stable reading DPF: no anticipation NTF, AMP, REG: better significance on text1 QCM: better understanding with simplified versions 16/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 20. Context Difficulty Paraphrases Conclusion Detection of difficulties: NLP (Grabar et al., 2014) Medical words from Snomed International (Cˆot´e et al., 1993) 29,641 lemmatized words Manually annotated: by 3 independent annotators: categories: 1 I can understand 2 I am not sure 3 I cannot understand inter-annotator agreement: Cohen’s Kappa 0.736 NLP task: supervised categorization automatically reproduce the manual annotations: F=0.90 24 descriptors: syntactic and morphological information, reference lexica, frequency, length, initial and final substrings, readability scores... 17/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 21. Context Difficulty Paraphrases Conclusion Detection of difficulties: NLP 18/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 22. Context Difficulty Paraphrases Conclusion Detection of difficulties Typology abbreviations (OG, VG, PAPS, j, bat, cp); proper names (Gougerot, Sj¨ogren, Bentall, Glasgow, Babinski, Barthel, Cockcroft); drug names; neoclassical compounds - disorders, procedures, treatments (pseudoh´emophilie, scl´erodermie, hydrolase, tympanectomie, arthrod`ese, synesth´esie); borrowings from Latin or English; human anatomy (cloacal, pubovaginal, nasopharyng´e, mitral, antre, inguinal, strontium, ´eryth`eme, maxillo-facial, m´esent`ere); lab test results. 19/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 23. Context Difficulty Paraphrases Conclusion Acquisition of paraphrases 1 Contexte 2 Detection of difficulties 3 Acquisition of paraphrases 4 Conclusion 20/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 24. Context Difficulty Paraphrases Conclusion Acquisition of paraphrases Existing work: general language Revision of Simple Wikipedia articles (Yatskar et al., 2010): probabilistic models and filters between 1,079 and 2,970 pairs: {stands for, is the same as}, {indigenous, native} precision: 17% to 86%; Methods from machine translation (Zhu et al., 2010; Wubben et al., 2012): parallel and aligned corpora (Wikipedia/Simple Wikipedia) Distributional methods (Glavas & Stajner, 2015; Kim et al., 2016): monolingual corpora vectors can contain equivalents easier to understand filtering 21/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 25. Context Difficulty Paraphrases Conclusion Acquisition of paraphrases Existing work: medical language Automatic translator of medical terms to general language (McCray et al., 1999): MEDLINEplus (brochures) Consumer Health Vocabulary (CHV) (Zeng & Tse, 2006) collaborative approach Morpho-syntactic variants (Del´eger & Zweigenbaum, 2008; Cartoni & Del´eger, 2011): {consommation r´eguli`ere, consommer de fa¸con r´eguli`ere} {gˆene `a la lecture, empˆeche de lire} Social media specificities (Tapi Nzali et al., 2015): misspellings {cirrhose, cyrose}, {m´etastase, metastase} reduced words {oncologue, onco}, {chimioth´erapie, chimio} 22/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 26. Context Difficulty Paraphrases Conclusion Acquisition of paraphrases Definitions (Antoine & Grabar, 2017) Reformulations (Antoine & Grabar, 2017) Morphological composition (Grabar & Hamon, 2014; Grabar & Hamon, 2016) 23/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 27. Context Difficulty Paraphrases Conclusion Definitions Methods Definition: structure with two elements: definiendum (term to define) and definiens (the definition) Myocarde est le tissu musculaire du coeur Use of four patterns (P´ery-Woodley & Rebeyrolle, 1998) d´esigne (means) est un (is a) est appel´e (called as) peut ˆetre d´efini comme (can be defined as) ...with inflectional variants Trigger: term 24/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 28. Context Difficulty Paraphrases Conclusion Definitions Results Extraction: 2,037 definitions 1,286 unique terms Evaluation: strict precision: 52.5% correct definitions: 849 weak precision: 68% correct and possibly correct definitions: 1,028 Types of terms: compound terms: hypoglyc´emie, acidoc´etose, angiographie, hypokali´emie, affixed terms: curetage, capsulite, arthrose, glaucome, durillon, pr´e-diab`ete, non-constructed terms: cataracte, imp´etigo, zona 25/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 29. Context Difficulty Paraphrases Conclusion Definitions Results L’hypoglyc´emie est un manque de sucre dans l’organisme Une septic´emie est un empoisonnement du sang du `a un microbe Le curetage est un nettoyage en profondeur d’une gencive inflamm´ee Pour un ˆetre humain adulte, une hypoglyc´emie est une glyc´emie inf´erieure `a 0,8 g/L Les signes classiques annonciateurs de l’hypoglyc´emie sont des sueurs, pˆaleur, palpitations, fringales en particulier L’imp´etigo est une infection cutan´ee, qui provoque des pustules qui d´eg´en`erent en croˆutes jaunˆatres, l’imp´etigo est due `a... 26/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 30. Context Difficulty Paraphrases Conclusion Definitions Results Readability (p´ericarde): + La couche ext´erieure du cœur est appel´ee p´ericarde. ∼ Le p´ericarde est un sac `a double paroi contenant le cœur et les racines des gros vaisseaux sanguins. − Le p´ericarde est un organe de glissement, form´e de deux feuillets limitant une cavit´e virtuelle, la cavit´e p´ericardique, qui permet les mouvements cardiaques. 27/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 31. Context Difficulty Paraphrases Conclusion Reformulations Motivation Reformulation: say differently (Le Bot et al., 2008) Occurrence of reformulations: indicates presence of difficult words/terms provides triggers for the extraction Exploit reliable data: health fora with moderators Wikipedia 28/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 32. Context Difficulty Paraphrases Conclusion Reformulations Methods concept marker reformulation v´esiculaire, c’est-`a-dire, venant de la v´esicule biliaire 3 markers : c’est-`a-dire (I mean) autrement dit ; Autrement dit (in other words) encore appel´e(e)(s) (also called) Pre-processing POS-tagging and syntactic analysis by Cordial (Laurent et al., 2009) Trigger: markers Extraction of concept and of reformulation: syntactic information boundaries: syntagms or propositions 29/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 33. Context Difficulty Paraphrases Conclusion Reformulations form lemma POS POSMT GS type GS Prop Vous vous PPER2P Pp2.pn 1 S 1 ne ne ADV Rpn 3—1 S 1 devez devoir VINDP2P Vmip2p 3 V 1 pas pas ADV Rgn 3 Q 1 employer employer VINF Vmn – 5 D 2 de de PREP Sp 7 D 2 savons savon NCMP Ncmp 7 D 2 ou ou COO Cc 7 F 2 des de le DETDPIG Da-.p-i 10—7 F 2 laits lait NCMP Ncmp 10—7 F 2 sophistiqu´es sophistiqu´e ADJMP Afpmp 10—7 F 2 , , PCTFAIB Ypw - - 2 c’ ce PDS Pd-..- 13 N 2 est est ADV Rgp - p 2 -`a `a PREP Sp 16 F 2 -dire dire VINF Vmn– 16 F 2 contenant contenant NCMS Ncms 17 D 2 plusieurs plusieurs ADJIND Dt-.p- 19 D 2 composants composant NCMP Ncmp 19 D 2 30/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 34. Context Difficulty Paraphrases Conclusion Reformulations form lemma POS POSMT GS type GS Prop Vous vous PPER2P Pp2.pn 1 S 1 ne ne ADV Rpn 3—1 S 1 devez devoir VINDP2P Vmip2p 3 V 1 pas pas ADV Rgn 3 Q 1 employer employer VINF Vmn – 5 D 2 de de PREP Sp 7 D 2 savons savon NCMP Ncmp 7 D 2 ou ou COO Cc 7 F 2 des de le DETDPIG Da-.p-i 10—7 F 2 laits lait NCMP Ncmp 10—7 F 2 sophistiqu´es sophistiqu´e ADJMP Afpmp 10—7 F 2 , , PCTFAIB Ypw - - 2 c’ ce PDS Pd-..- 13 N 2 est est ADV Rgp - p 2 -`a `a PREP Sp 16 F 2 -dire dire VINF Vmn– 16 F 2 contenant contenant NCMS Ncms 17 D 2 plusieurs plusieurs ADJIND Dt-.p- 19 D 2 composants composant NCMP Ncmp 19 D 2 31/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 35. Context Difficulty Paraphrases Conclusion Reformulations form lemma POS POSMT GS type GS Prop Vous vous PPER2P Pp2.pn 1 S 1 ne ne ADV Rpn 3—1 S 1 devez devoir VINDP2P Vmip2p 3 V 1 pas pas ADV Rgn 3 Q 1 employer employer VINF Vmn – 5 D 2 de de PREP Sp 7 D 2 savons savon NCMP Ncmp 7 D 2 ou ou COO Cc 7 F 2 des de le DETDPIG Da-.p-i 10—7 F 2 laits lait NCMP Ncmp 10—7 F 2 sophistiqu´es sophistiqu´e ADJMP Afpmp 10—7 F 2 , , PCTFAIB Ypw - - 2 c’ ce PDS Pd-..- 13 N 2 est est ADV Rgp - p 2 -`a `a PREP Sp 16 F 2 -dire dire VINF Vmn– 16 F 2 contenant contenant NCMS Ncms 17 D 2 plusieurs plusieurs ADJIND Dt-.p- 19 D 2 composants composant NCMP Ncmp 19 D 2 32/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 36. Context Difficulty Paraphrases Conclusion Reformulations Evaluation Dev. Test P R F nb occ. 96 2 757 exact 0.24 0.24 0.24 nb types 96 2 710 inexact 0.98 0.98 0.98 Difficulties: detection of boundaries: en c’est-`a-dire au contact du sang circulant une toxi-infection, c’est-`a-dire, qu’ elle peut semantics: en 10 ans autrement dit sur 64 millions de personnes un objectif c’est-`a-dire une finalit´e 33/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 37. Context Difficulty Paraphrases Conclusion Reformulations Results des canaux galactophores c’est-`a-dire s´ecr`etent le lait erratiques c’est-`a-dire qu’ils changent de d’aspect et d’endroit par une lithiase c’est-`a-dire un caillou clivage du moi c’est-`a-dire comme une opposition entre le moi et la r´ealit´e au gr´e de la d´esint´egration radioactive du 18 F c’est-`a-dire avec une demi-vie d’environ un trouble de l’identit´e sexuelle c’est-`a-dire qu’ils s’identifient `a un genre ne correspondant pas `a leur sexe biologique une enzyme prot´eolytique c’est-`a-dire dig`ere les prot´eines comme le fait le suc pancr´eatique celle de troubles fonctionnels intestinaux encore appel´es colopathie fonctionnelle 34/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 38. Context Difficulty Paraphrases Conclusion Morphological composition Morphological analysis of components TranslationPOS−tagging Medical terms Corpus POS−tagging Syntactic analysis Evaluation Alignment Processing of terms myocarde myocarde/Nom [[[myo N*] [carde N*] NOM] ique ADJ] myo=muscle, carde=coeur Processing of corpus Les causes de tachycardie ventriculaire sont superposables `a celles des extrasystoles ventriculaires: infarctus du myocarde, insuffisance cardiaque, hypertrophie du muscle du cœur et prolapsus de la valve mitrale. 35/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 39. Context Difficulty Paraphrases Conclusion Morphological composition Morphological analysis of components TranslationPOS−tagging Medical terms Corpus POS−tagging Syntactic analysis Evaluation Alignment Processing of terms myocarde myocarde/Nom [[[myo N*] [carde N*] NOM] ique ADJ] myo=muscle, carde=coeur Processing of corpus Les causes de tachycardie ventriculaire sont superposables `a celles des extrasystoles ventriculaires: infarctus du myocarde, insuffisance cardiaque, [hypertrophie du [muscle du cœur]] et prolapsus de la valve mitrale. 36/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 40. Context Difficulty Paraphrases Conclusion Morphological composition Results Alignment syntagm/term (percentage of alignment): E1: full term and syntagm: {myo pathie, maladie du muscle} E2: full term, partial syntagm: {myo pathie, maladie du muscle cardiaque} E3: partial term, full syntagm: {myopathie, la maladie} E4: partial term and syntagm: {myopathie, l’ origine de la maladie} 37/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 41. Context Difficulty Paraphrases Conclusion Morphological composition Evaluation Nb of unigrams bigrams trigrams b l s b l s b l s correct paraphrases 549 785 644 378 517 461 195 290 257 poss. correct 39 32 67 22 45 75 10 19 41 processing of terms 47 60 44 28 28 46 9 10 26 incorrect paraphrases 33 146 296 64 80 380 25 39 148 Pstrict 82 77 61 77 77 48 82 81 55 Pweak 88 80 68 81 84 40 86 86 63 %incorrect 5 14 28 13 12 39 11 11 31 Evaluation: strict precision 82 to 55% weak precision 86 to 40% error rate 5 to 39% Resources without: the best precision morphology: good precision synonymy: low precision 38/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 42. Context Difficulty Paraphrases Conclusion Morphological composition Morphological analysis Ambigous analysis [post [[uro N*] [graphie N*] NOM] NOM] [[posturo N*] [graphie N*] NOM] Incorrect analysis sanglot: lot and sang exotique: externe and oreille divin: deux and vin (deux litres de vin) 39/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 43. Context Difficulty Paraphrases Conclusion Morphological composition Extraction of paraphrases and their evaluation Correct paraphrases raw {podalgie, douleur du pied} {mastite, inflammation du sein} {cystoprostatectomie, ablation de la vessie et de la prostate} Morphology {desmorrhexie, rupture des ligaments} (ligament→ligaments) {bronchite, inflammation des bronches/inflammation bronchique} (bronche→bronches, bronche→bronchique) {dentalgie, douleurs dentaires} (dents→dentaires) Synonymy {aclasie, absence de fracture} (cassure→fracture) {enterectomie, r´esection des intestins} (ablation→r´esection) 40/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 44. Context Difficulty Paraphrases Conclusion Morphological composition Extraction of paraphrases and their evaluation Semantic relations between components: well managed by data from corpora errors: coordination/subordination hematospermie: le sang ou le sperme, instead of → le sang dans le sperme Non-compositional terms: ost´eodermie: peau and os, instead of → une structure d’´ecailles, de plaques osseuses ou d’autres compositions dans les couches dermiques de la peau, comme chez les l´ezards ou dinosaures 41/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 45. Context Difficulty Paraphrases Conclusion Comparison with existing work term type nb. para precision (Zeng et al., 2006) all CHV (Elhadad & Sutaria, 2007) all 152 0.58 (Del´eger & Zweigenbaum, 2008) m-synt. 65, 82 0.67, 0.60 (Cartoni & Del´eger, 2011) m-synt. 109 0.66 definitions all 1,028 0.52, 0.68 morphology compounds 1,128 0.76, 0.86 abbreviations abbr. 42, 8,106 0.74/0.94 reformulation all 96, 2,710 0.24/0.98 parentheses all 305, 92,971 0.23/0.68 morpho-syntactic: {consommation r´eguli`ere, consommer de fa¸con r´eguli`ere} comparable performance, better coverage 42/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 46. Context Difficulty Paraphrases Conclusion Comparison with existing work D´eriF (Namer, 2003): gloss in formal language for every analyzed word our method: coverage depends on content of corpora myocarde: ”(Partie de – Type particulier de) coeur en rapport avec le(s) muscle” muscle du coeur desmorrhexie: ”rupture (du – li´ee au) ligament” rupture des ligaments 43/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 47. Context Difficulty Paraphrases Conclusion Conclusion Detection of difficulties in reading and understanding Acquisition of resources for explaining technical terms Methods dedicated to different kinds of linguistic phenomena paraphrases, reformulations... Exploitation of general language corpora Complementary methods Interesting and exploitable results Work in French Diagnosis of text modelref. ref. model res. rules Detection of difficult words Simplification /decoration difficult Text Simplified text 44/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 48. Context Difficulty Paraphrases Conclusion Future work Increase the coverage of paraphrases and reformulations: more corpora comparables (Cochrane, patient package inserts, Wiki/Viki) monolingual more suppletive resources other methods for extracting the paraphrases Alignment with medical terminologies Distribution of the resource Other languages Lexical simplification of medical texts ANR project CLEAR (Communication, Literacy, Education, Accessibility, Readability) Diagnosis of text modelref. ref. model res. rules Detection of difficult words Simplification /decoration difficult Text Simplified text 45/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 49. Context Difficulty Paraphrases Conclusion AMA (1999). Health literacy: report of the council on scientific affairs. Ad hoc committee on health literacy for the council on scientific affairs, American Medical Association. JAMA, 281(6), 552–7. Antoine, E. & Grabar, N. (2017). Acquisition of expert/non-expert vocabulary from reformulations. In MIE, Stud Health Technol Inform. 235, pp. 521–525. Berland, G., Elliott, M., Morales, L., Algazy, J., Kravitz, R., Broder, M., Kanouse, D., Munoz, J., Puyol, J. & et al, M. L. (2001). Health information on the internet. accessibility, quality, and readability in english ans spanish. JAMA, 285(20), 2612–2621. Bertram, R., Kuperman, V., Baayen, H. R. & Hy¨on¨a, J. (2011). The hyphen as a segmentation cue in triconstituent compound processing: It’s getting better all the time. Scandinavian Journal of Psychology, 52(6), 530–544. Beyersmann, E., Coltheart, M. & Castles, A. (2012). Parallel processing of whole words and morphemes in visual word recognition. The Quarterly Journal of Experimental Psychology, 65(9), 1798–1819. Bozic, M., Marslen-Wilson, W. D., Stamatakis, E. A., Davis, M. H. & Tyler, L. K. (2007). Differentiating morphology, form, and meaning: Neural correlates of morphological complexity. 45/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 50. Context Difficulty Paraphrases Conclusion Journal of Cognitive Neuroscience, 19(9), 1464–1475. Cartoni, B. & Del´eger, L. (2011). D´ecouverte de patrons paraphrastiques en corpus comparable: une approche bas´ee sur les n-grammes. In Traitement Automatique des Langues Naturelles (TALN). Chmielik, J. & Grabar, N. (2009). Comparative study between expert and non-expert biomedical writings: their morphology and semantics. Stud Health Technol Inform., 150, 359–63. Chmielik, J. & Grabar, N. (2011). D´etection de la sp´ecialisation scientifique et technique des documents biom´edicaux grˆace aux informations morphologiques. TAL, 51(2), 151–179. Cˆot´e, R. A., Rothwell, D. J., Palotay, J. L., Beckett, R. S. & Brochu, L. (1993). The Systematised Nomenclature of Human and Veterinary Medicine: SNOMED International. Northfield: College of American Pathologists. Del´eger, L. & Zweigenbaum, P. (2008). Paraphrase acquisition from comparable medical corpora of specialized and lay texts. In Ann Symp Am Med Inform Assoc (AMIA), pp. 146–50. Dohmes, P., Zwitserlood, P. & B¨olte, J. (2004).45/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 51. Context Difficulty Paraphrases Conclusion The impact of semantic transparency of morphologically complex words on picture naming. Brain and Language, 90(1-3), 203–212. Elhadad, N. & Sutaria, K. (2007). Mining a lexicon of technical terms and lay equivalents. In BioNLP, pp. 49–56. Flesch, R. (1948). A new readability yardstick. Journ Appl Psychol, 23, 221–233. Frisson, S., Niswander-Klement, E. & Pollatsek, A. (2008). The role of semantic transparency in the processing of english compound words. Br J Psychol, 99(1), 87–107. Glavas, G. & Stajner, S. (2015). Simplifying lexical simplification: Do we need simplified corpora? In ACL-COLING, pp. 63–68. Goeuriot, L., Grabar, N. & Daille, B. (2007). Caract´erisation des discours scientifique et vulgaris´e en fran¸cais, japonais et russe. In Traitement Automatique des Langues Naturelles (TALN), pp. 93–102. Grabar, N., Farce, E. & Sparrow, L. (2018). ´Etude de la lisibilit´e des documents de sant´e avec des m´ethodes d’oculom´etrie. In Traitement Automatique des Langues Naturelles (TALN), pp. 1–14. 45/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 52. Context Difficulty Paraphrases Conclusion Grabar, N. & Hamon, T. (2014). Automatic extraction of layman names for technical medical terms. In ICHI 2014, Pavia, Italy. Grabar, N. & Hamon, T. (2016). Exploitation de la morphologie pour l’extraction automatique de paraphrases grand public des termes m´edicaux. TAL, 57(1), 85–109. Grabar, N., Hamon, T. & Amiot, D. (2014). Automatic diagnosis of understanding of medical words. In EACL PITR Workshop, pp. 11–20. Grabar, N., Krivine, S. & Jaulent, M. (2007). Classification of health webpages as expert and non expert with a reduced set of cross-language features. In Ann Symp Am Med Inform Assoc (AMIA), pp. 284–288. Gunning, R. (1973). The art of clear writing. New York, NY: McGraw Hill. Jarema, G., Busson, C., Nikolova, R., Tsapkini, K. & Libben, G. (1999). Processing compounds: A cross-linguistic study. Brain and Language, 68(1-2), 362–369. Kim, Y.-S., Hullman, J., Burgess, M. & Adar, E. (2016). Simplescience: Lexical simplification of scientific terminology. In EMNLP, pp. 1–6.45/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 53. Context Difficulty Paraphrases Conclusion Koester, D. & Schiller, N. O. (2011). The functional neuroanatomy of morphology in language production. NeuroImage, 55(2), 732–741. Kokkinakis, D. & Toporowska Gronostaj, M. (2006). Comparing lay and professional language in cardiovascular disorders corpora. In A. Pham T., James Cook University, Ed., WSEAS Transactions on BIOLOGY and BIOMEDICINE, pp. 429–437. Laurent, D., N`egre, S. & S´egu´ela, P. (2009). L’analyseur syntaxique Cordial dans Passage. In Traitement Automatique des Langues Naturelles (TALN). Le Bot, M.-C., Schuwer, M. & ´Elisabeth Richard (dir.) (2008). La reformulation : Marqueurs linguistiques – Strat´egies ´enonciatives. Rennes: Rivages linguistiques. Leroy, G., Helmreich, S., Cowie, J., Miller, T. & Zheng, W. (2008). Evaluating online health information: Beyond readability formulas. In Ann Symp Am Med Inform Assoc (AMIA), pp. 394–8. Libben, G., Gibson, M., Yoon, Y. B. & Sandra, D. (2003). Compound fracture: The role of semantic transparency and morphological headedness. Brain and Language, 84(1), 50–64. L¨uttmann, H., Zwitserlood, P. & B¨olte, J. (2011). 45/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 54. Context Difficulty Paraphrases Conclusion Sharing morphemes without sharing meaning: Production and comprehension of german verbs in the context of morphological relatives. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie exp´erimentale, 65(3), 173–191. McCray, A. (2005). Promoting health literacy. J of Am Med Infor Ass, 12, 152–163. McCray, A., Loane, R., Browne, A. & Bangalore, A. (1999). Terminology issues in user access to web-based medical information. In Ann Symp Am Med Inform Assoc (AMIA), pp. 107–7. Namer, F. (2003). Automatiser l’analyse morpho-s´emantique non affixale: le syst`eme D´eriF. Cahiers de Grammaire, 28, 31–48. Patel, V., Branch, T. & Arocha, J. (2002). Errors in interpreting quantities as procedures : The case of pharmaceutical labels. Int Journ Med Inform, 65(3), 193–211. P´ery-Woodley, M. & Rebeyrolle, J. (1998). Domain and genre in sublanguage text: definitional microtexts in three corpora. In LREC, pp. 987–992. Poprat, M., Mark´o, K. & Hahn, U. (2006). A language classifier that automatically divides medical documents for experts and health care consumers. 45/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 55. Context Difficulty Paraphrases Conclusion In Int Congress of the European Federation for Medical Informatics, pp. 503–508, Maastricht. Quinlan, J. (1993). C4.5 Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann. Specia, L., Jauhar, S. & Mihalcea, R. (2012). Semeval-2012 task 1: English lexical simplification. In *SEM 2012, pp. 347–355. Tapi Nzali, M., Bringay, S., Lavergne, C., Opitz, T., Az´e, J. & Mollevi, C. (2015). Construction d’un vocabulaire patient/m´edecin d´edi´e au cancer du sein `a partir des m´edias sociaux. In IC 2015. Tran, T., Chekroud, H., Thiery, P. & Julienne, A. (2009). Internet et soins : un tiers invisible dans la relation m´edecine/patient ? Ethica Clinica, 53, 34–43. Wang, Y. (2006). Automatic recognition of text difficulty from consumers health information. In IEEE, Ed., Computer-Based Medical Systems, pp. 131–136. Williams, M., Parker, R., Baker, D., Parikh, N., Pitkin, K., Coates, W. & Nurss, J. (1995). Inadequate functional health literacy among patients at two public hospitals. JAMA, 274(21), 1677–1682. 45/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 56. Context Difficulty Paraphrases Conclusion Wubben, S., van den Bosch, A. & Krahmer, E. (2012). Sentence simplification by monolingual machine translation. In Annual Meeting of the Association for Computational Linguistics, pp. 1015–1024. Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C. & Lee, L. (2010). For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia. In NAACL, pp. 365–368. Zeng, Q. & Tse, T. (2006). Exploring and developing consumer health vocabularies. JAMIA, 13, 24–29. Zeng, Q. T., Tse, T., Divita, G., Keselman, A., Crowell, J. & Browne, A. C. (2006). Exploring lexical forms: first-generation consumer health vocabularies. In Ann Symp Am Med Inform Assoc (AMIA), pp. 1155–1155. Zeng-Treiler, Q., Kim, H., Goryachev, S., Keselman, A., Slaugther, L. & Smith, C. (2007). Text characteristics of clinical reports and their implications for the readability of personal health records. In MEDINFO, pp. 1117–1121, Brisbane, Australia. Zheng, W., Milios, E. & Watters, C. (2002). Filtering for medical news items using a machine learning approach. In Ann Symp Am Med Inform Assoc (AMIA), pp. 949–53. 45/45 Automatic text simplification in biomedical domain Natalia Grabar
  • 57. Context Difficulty Paraphrases Conclusion Zhu, Z., Bernhard, D. & Gurevych, I. (2010). A monolingual tree-based translation model for sentence simplification. In COLING 2010, pp. 1353–1361. 45/45 Automatic text simplification in biomedical domain Natalia Grabar