Shibut poster i11 168

Selection and Aggregation of Sentences
in the Knowledge Formation Process
M.S. Shibut, V.S. Yakovishin
The Academy of Public Administration under the aegis of the President of the Republic of Belarus,
17, Moskovskaya Str., 220007, Minsk, Republic of Belarus, m_shibut@pac.by,
http://pac.by/en
Let S , S , S , S , S be sentences, expressed in terms of formal language, as shown in the figure below,1 2 3 4 5
where a, in, o are signs of the secondary sentence parts, p, pt, pPs are signs of the different predicates (for
thepresent,pastindefinite,andpresentsimplepassive,respectively).
According to the selection rule, the first sentence must be eliminated because of intensional superiority of
the second sentence (S Н S ). The sentences S , S , S , S can be integrated in compliance with the1 2 2 3 4 5
aggregation rule. Let “man”, “young man”, “library” be the subjects contained in user's request. Then, as a
result of integration on the given subjects, the following three subject knowledge descriptions can be
obtained:s({man})={S ,S ,S },s({man,man_a.young})={S ,S },s({library})={S ,S }.2 3 5 2 5 2 4
Knowledge-based text adaptation.The subject knowledge formation can be used as a basis for automatic
creation(compiling)ofadapted(user-oriented)textmaterials,suchas
-variousinformation-analyticalreviews;
-individualelectronictextbooks;
-anyotheradaptedtextmaterials.
Knowledge-based information search. The information search can be realized as a two-stage process
(thatresemblestheoreprocessing):
- data search: the usual information retrieval is realized to draw information (as full as possible) from a
numberofsources;
- knowledge search (“ore dressing”): the obtained results are processed to extract only the important
information(“valuableelements”).
Knowledge-based machine translation. In the translation of the source text from one natural language to
another, the subject knowledge base (where the lexical compatibility is fixed) can be used as a supporting
interlingua, that plays the role of an effective filter for screening all the misplaced meanings of polysemous
words.
The knowledge formation is presented as the process of selection and aggregation of input sentences. In
this process, the text sentences are at first transformed into the formal language, and then they are
integrated into the knowledge representation. The integration of the sentences that have one and the same
subject is considered as a subject knowledge representation, and any collection of the subject knowledge
representations, produced in the knowledge formation process, is considered as a user-oriented (“highly
tailored”) description of subject field. It is supposed that the subject (usually characterized as “the
something or someone that the sentence is about”, “the thing being talked about”) is expressed by a
grammatically separated noun phrase that represents either the absolutely independent part of sentence
(the formal subject of the division subject-predicate) or the general determinative part, i.e. the attribute that
relates to the whole sentence (the actual subject of the division theme-rheme, also known as topic-
comment,representingthe“reflectionofthespeaker'sattitudetowardswhatissaid”).
The presented here knowledge formation method is based on the using of the special formal language. In
the formal language, input text sentences are expressed in the set-theoretical (parenthesis-free, “discrete”)
form as sets of their syntactic elements (syntagmes), which allows us to reduce the semantic identification
ofsentencestotheusingofstandardset-theoreticalrelationofinclusion.
Subject knowledge formation is a growth process in which two formation rules, namely the rules of
selectionandaggregationofsentences,mustrealize.
Selectionrule:o sentencesS andS mustbeeliminated,ifitisasubset ofanothersentence,i.e.1 2
{S , S }® S , if S КS .1 2 1 1 2
Aggregation rule realizes the integration of already selected sentences: if S , S , ... are sentences that1 2
havethesamesubjectN, theywilluniteinasubjectknowledgerepresentation,i.e.
{S ,S , ...}® s(N).1 2
neof the
Subject knowledge representation is a set s(N) of sentences S , S , ... with the common subject,1 2
representedbyanounphraseN(containedinuser's request),i.e.
s(N){S | К N, i і 1}.i
Subject field representation is any collection s(N , N , ...) of subject knowledge representation produced1 2
intheknowledgeformationprocess,i.e.
s(N , N , ...) = {s(N ), s(N ), ... },1 2 1 2
where N , N , are noun phrases that play the role of subjects in the division “subject-predicate” or in the1 2
actualdivision“theme-rheme”.
Si
Stepwise subordination:
Syntagme:
(as in The book of the new author)
(as in The new book)
(X ∆ (X X ))={X ∆ X , X ∆ X }1 1 2 2 3 1 1 2 2 2 3∆
(X ∆X )={X ∆X }1 2 1 2
Collateral subordination:
(as in The new book of the author)
((X ∆ X )∆ X )={X ∆ X , X ∆ X }1 1 2 2 3 1 1 2 1 2 3
Multisyntagme:
(as in The new and old books)
(X ∆(X СX ))={X ∆X , X ∆X }1 2 3 1 2 1 3
Subject (absolutely independent part):
(as in The man reads a book)
((X ∆ X )∆ X )={X , X ∆ X , X ∆ X }1 1 2 2 3 1 1 1 2 1 2 3
Theme (topic):
(as in In the evening, the man reads a book)
((X ∆ (X ∆ X ))∆ X )={∆ X , X , X ∆ X , X ∆ X }1 1 2 2 3 3 4 3 4 1 1 1 2 2 2 3
The book
The book
The man
The man
of the author
reads
reads
new
new
a book
a book in the evening
dependent
member
dependent
member
dependent
members
homogeneous
parts
subject
subjecttheme
head
member
head
member
head
members
The book
The book
of the authornew
new and old
Input sentences
1. The young man reads a book.
2. The young man reads a book in the library.
3. The man walked in the park.
4. The library is situated in a graceful street.
5. The young man kicked the ball.
…
Knowledge representation
1. man, man_a.young, man_p.read, read_o.book
2. man, man_a.young, man_p.read, read_o.book, read_in.library
3. man, man_pt.walk, walk_ in.park
4. library, library_pPs. situate, situate_in.street, street_a. graceful
…
Knowledge representation
2. man, man_a.young, man_p.read, read_o.book, read_in.library
3. man, man_pt.walk, walk_ in.park
4. library, library_pPs. situate, situate_in.street, street_a. graceful
…
Knowledge representation for “library”
__________________________
…
4. library,
library_pPs. situate,
situate_in.street,
street_a. graceful
2.man,
man_a.young,
man_p.read,
read_o.book,read_in.library
Knowledge representation for “man”
2. man,
man_a.young,
man_p.read,
read_o.book, read_in.library
3. man_pt.walk,
walk_ in.park
__________________________
…
User-oriented description of subject field
2. The library is situated in a graceful street.
User-oriented description of subject field
2. The man walked in the park.
4. The young man kicked the ball.
Selection rule
Aggregation
rule Query “man”Query “library”
Id14
The described research was supported by research program on the Development of the State System of
Scientific and Technical Information of the Republic of Belarus for 2009-2010, task No 3.3, sponsored by
theStateCommitteeforScienceandTechnologyoftheRepublicofBelarus.
We are pleased to thank prof. Rauf Sadykhov and prof.Anatoly Sachenko for their assistance.We are also
verygratefultodr.IrynaTurchenkoforthepresentationofourpaper.
Transformation into the
formal language
Knowledge
formation

Shibut poster i11 168

Recommended

Recommended

More Related Content

Similar to Shibut poster i11 168

Similar to Shibut poster i11 168 (20)

Recently uploaded

Recently uploaded (20)

Shibut poster i11 168