Using Text Comprehension Model for Learning Concepts, Context, and Topic of Web Content

Using Text Comprehension Model for
Learning Concepts, Context, and Topic
of Web Content
11th International Conference on Semantic Computing
IEEE ICSC 2017 - San Diego, California, USA
Jan 30-Feb 1, 2017
Ismael Ali, Naser Al Madi, Austin Melton
Department of Computer Science
Kent State University

Outline
• Text Comprehension
• System Architecture and Workflow
• Semantic Learning
– Semantic Network Construction
– Mathematical Foundation
– Domain Concept Learning
– Topic Learning
– Context Learning
• Experimental Design
• Evaluation Strategy
• Results
• Conclusion and Future Works

Abstract
• Role of learning Semantics including concepts, contexts, and
topics from web documents
– semantic-based structuring and retrieving
• We present a novel approach for domain-independent
semantic learning.
• Our approach uses a computational version of the
Construction-Integration (CI) model of text comprehension.

Text Comprehension
• Comprehension is a cognitive-based learning process
• Comprehension produces the mental representations:
– perceptual
– verbal
– semantic representations
• CI model simulates the incremental and dynamic task of
comprehending the text and it leads to the construction of a
semantic network (SN)

CI as a Cognitive Model of Text Comprehension
This figure from: (Cathleen Wharton and Walter Kintsch, 1991 in ACM SIGART Bulletin)
Surface
Model
Text-Base
Model
Situation
Model
Situation
Model• Time of acquisition
• Recognizing main
concepts
• Integrating them with
background knowledge

System Architecture and Workflow
Using Stanford CoreNLP
1. Text tokenization
2. Lemmatization
3. Sentence splitting
- To get the Surface Model.
4. Part of Speech Tagging
5. Anaphora Resolution
Running the
computational CI model
to produce weighted
semantic network
Analysis and
filtering of the
weighted semantic
networks

Semantic Network Construction
• Sentences are presented as single units of time (a reading
episode)
• “Knowledge is a familiarity. Awareness or understanding of
something. Such as facts.”
Recognized Concepts
Neglected Concepts
Recognized Associations
Neglected Associations
Fig. 2. Sample Concept Network.
(After running the CI model)

• “Knowledge is a familiarity. Awareness or understanding of something. Such
as facts.”
• Episodes of {e1
, e2
, ... , ei
} are background knowledge for episode {ei+1
}
• Weights on edges represents the semantic association strength
Fig. 2. Sample Concept Network.
(After running the CI model)
1. concept recognition threshold (S) is 7
for Fig. 2
– s(“something”) = 6
– e1 + e2 < S
– s(“Awareness”) = 12
– e3 + e4 > S
2. association recognition threshold (I)
is 5 for Fig. 2
– i(“Knowledge”,”facts”) < I
– i(“Knowledge”,”Awareness”) > I
Semantic Network Construction

1. Associative Matrix is generated from Text-base model
2. Each sentence forms an Individual Concept Network, ICN
3. All ICN graphs are combined to create the Base Semantic Network, BSN
Semantic Network Construction:
Semantic Association Graph
C1-Sent-ID C2-Sent-ID;in which
C2 1st occured
C3-Sent-ID C4-Sent-ID ... Cn-Sent-ID
1 2 3 4 ... n
C1 C2 C3 C4 ... Cn
1 C1
2 C2
3 C3 Sentence-ID of 1st
episode, which
C3 and C2
are co-occurrence
4 C4
... ...
n Cn

- Finding weights and thresholds:
4. BSN shows recognized the which were neglected concepts and associations
6. BSN Semantic network is represented as a set of inequalities:
- Inequalities set upper- and lower-bound for concept (S) and association (I) recognition thresholds
- Linear programming finds the suitable values for all variables to satisfy the inequalities
7. Finding values for the variable vector X that satisfies the inequalities; by minimizing the problem
specified in:
Semantic Network Construction:
Mathematical Foundation
Where:
- f is the linear objective function
- A is the left hand side of the inequalities
- B is the right hand side of the inequalities
- LB is the lower bound of the solution
- UB is the upper bound of the solution
- The resulting variable vector contains
weights for nodes and associations, along
with individual thresholds (S) and (I) values
for recognizing concepts and associations.

Domain Concepts Learning
• variable vector used to construct the semantic network Gi
= (Ci
, Ei
)
• Then the concept filtering performed to learn domain concepts
• Domain concepts for web document di
are the concepts in a subgraph G*
i
of
its semantic network Gi
:
- G*
i
= (C*
i
, E*
i
) where;C*
i
⊂ Ci
, and Ei
*
⊂ Ei
• Filtering mechanisms:
(1) statistical-based filtering: mean threshold and median threshold
(2) positive-based filtering: suggested for the proposed cognitive-based
semantic learning approach

Topic Learning
• Foreach domain concept ci
∈ C*
i
in dj
calculate the Topic Identification
Weight (Tiw):
– CIw
(ci
) : the weight calculated the computational CI model
– Eigenvector(ci
) : the value of eigenvector centrality measure as the
function of the centralities of its neighbors
– e(ci
) is the episode in which the given concept ci
first appeared
• Topic Identification:
– Topic concept of di
is the concept with the highest Tiw weight
– The most influential node in the semantic network G*
i
of domain
concept set

Context Learning
• The context of the di
is the all the nearest neighbor (nodes
with distance k=1) to the topic concept
• Thus the context includes :
– the most semantically associated to the topic concept
– a normal distribution of a concept selection from
different sections of the text

Experimental Design
• A diverse set of ten randomly selected web documents
from Wikipedia
– astronomy, brain, cognition, ecology, knowledge, law,
literacy, robotic, virus and tennis
• Testing the the openness (domain-independency) property
of our approach in learning semantics of the web contents

Evaluation Strategies
• Results of filtering mechanisms are evaluated by human judgment strategy [4]:
1. A set of seven human judges (domain experts) selected, KSU
2. Human judges were asked to evaluate the list(s) of all potential concepts learned
from the CI model for each web document
3. Then asked to identify whether the concepts belonged to a given domain or not
4. Next, domain concepts identified by the domain experts were compared against the
domain concepts identified by each concept filtering strategy.
5. Then the quality of each concept filtering strategy was evaluated.
• The evaluation performed using the binary evaluation measures from IR: Precision, Recall
and F1

Domain Concepts Analysis
Domain concepts for web document of Ecology

Context and Topic Analysis
Context for web document of EcologyTopic-Concept for web document of Ecology

• We investigated a novel approach for open learning of the concepts,
contexts, and topics of web contents.
• Our approach is based on the Construction-Integration (CI) model of text
comprehension, which mimics the way humans learn the semantic
components of a web document.
• We also highlighted the use of cognitive science results in learning
semantics from web content.
• Our work is a step toward our future research on cognition and open
based:
– Ontology Learning
– Ontology Selection
Conclusion and Future Work

Using Text Comprehension Model for Learning Concepts, Context, and Topic of Web Content

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Using Text Comprehension Model for Learning Concepts, Context, and Topic of Web Content

Similar to Using Text Comprehension Model for Learning Concepts, Context, and Topic of Web Content (20)

Recently uploaded

Recently uploaded (20)

Using Text Comprehension Model for Learning Concepts, Context, and Topic of Web Content