Varvara Krayvanova - Automatic selection of verbs-markers for segmentation task of process descriptions in natural language texts
1. Varvara Krayvanova, AltSTU, Barnaul
krayvanova@gmail.com
Problem and tasks
What we want to have?
What we need to do?
Long
scientific
text
Wikification
Ontology
Illustrations
IDEF0
Use case
Activity
Slides for lectures
MAGIC
Long
scientific
text
Definitions
Nature process
Metodology
Metodology
Split
Special algorithms
Ontology
IDEF0
Use case
Nature process
2. Varvara Krayvanova, AltSTU, Barnaul
krayvanova@gmail.com
Model of text fragmentation
How we can split text into semantic fragments?
T = sk - natural language text,
sk is k-th sentence in the text.
The window Wi,j = si,..., sj is a continuous
sequence of sentences of text,
• i is the number of the first sentence,
• j is the number of the last sentence,
• L = j - i is window size.
We take all windows with size L.
For each window we calculate:
(1) Total count of nouns PNoun
(2) Count of different nouns PDiffNoun
(3) Total count of verbs PVerb
(4) Count of different verbs PDiffVerb
(5) Total count of adjectives PAdj
(6) Count of different adjectives PDiffAdj
And we clusterize the set of windows
using these parameters.
Ok, we have that:
I can read it, but I don’t want.
What about
this fragment?
Each sentence sk assigned to some cluster c
from a finite set of clusters C.
3. Varvara Krayvanova, AltSTU, Barnaul
krayvanova@gmail.com
Verb nest
How we can detect the fragment type?
Vk - the set of verbs in the sentence sk.
Ev = sk |v Vk - an ordered list of
sentences that contain a verb v.
• Vunic - rare verbs, |Ev| is below the
border: |Ev| < .
• Vcommon - common verbs.
• Vmarker - verbs-markers.
T
v = si | si sk Ev and |k-i|≤ - textual
neighborhood, cv = T
v c.
The verb vm is marker of cluster c, if:
•|cvm|/| T
vm|>,
•aC |avm|/| T
vm|≤ .
N = {v| Ev T
v} - text nest of verb-
marker vm.
Cluster annotation (expert) Verbs-markers (automatic
extraction)
Cluster 1. Description of the
research objects:
introduction definitions and
process of snow formation.
СЛУЖИТЬ, СМОТРЕТЬ,
ЗАВИСЕТЬ, ЯВЛЯТЬСЯ,
ОПРЕДЕЛЯТЬ, ИМЕТЬ,
ПРОИСХОДИТЬ
(TO SERVE, TO WATCH,
TO DEPEND, TO BE,
TO DEFINE, TO HAVE,
TO HAPPEN)
Cluster 2. Chapter about
calculations
and laboratory processing
of research results, different
tables of classifications,
fragments about
parameters measurement.
ВЫЧИСЛЯТЬ, ВЫЧИСЛЯТЬСЯ,
ЗАПИСЫВАТЬСЯ
(TO CALCULATE,
TO BE CALCULATE,
TO REGISTER)
Cluster 3. Observation
methodology: observation
areas marking, equipment
and recommendations.
СОСТОЯТЬ, ПРИНИМАТЬСЯ,
ИСПОЛЬЗОВАТЬ,
РЕКОМЕНДОВАТЬ, БЫТЬ
(TO CONSIST, TO BE TAKEN,
TO USE, TO RECOMMEND,
TO BE as a link-verb)