An effective method for semi automatic construction of domain module from electronic textbook

IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 08 | August-2015, Available @ http://www.ijret.org 104
AN EFFECTIVE METHOD FOR SEMI-AUTOMATIC CONSTRUCTION
OF DOMAIN MODULE FROM ELECTRONIC TEXTBOOK
Sowjanya Lakshmi Pamba1
, Seetha Maddala2
1
PG Scholar, Department of Computer Science Engineering, G. Narayanamma Institute of Technology and Science,
Telangana, India
2
Professor, Department of Computer Science Engineering, G. Narayanamma Institute of Technology and Science,
Telangana, India
Abstract
Information and Communication Technologies impact on academic institutions, lead to a growing need for effective creation and
management of digital content. Technology Supported Learning (TSL) systems have proved to be beneficent in numerous learning
circumstances. TSLSs need, regardless of the technology or the paradigm, an appropriate domain module. The domain module is
described as the pedagogical representation of the domain to be learnt and is considered as key of any TSLSs as it presents the
information about a subject matter to be conveyed to the beginner. The domain module authoring techniques are cost and labor
exhaustive. The development cost can be reduced by benefiting from semi-automatic domain module authoring approaches and
promoting reuse of knowledge. The proposed system uses heuristic reasoning, Natural Language Processing (NLP) techniques,
and ontologies for the semi-automatic generation of the domain module from electronic textbooks. The Domain Module encrypts
knowledge at two distinct levels: Learning Domain Ontology (LDO) which recognizes domain topics and pedagogical
relationships between them and Learning Objects (LO) where didactic resources used during the process of learning are
recognized and gathered. An electronic textbook has been evaluated to test how it would help in domain module authoring
process; the automatically generated knowledge has been contrasted against the domain module developed manually by the
instructional designers. The automatically gathered knowledge reduces the time complexity and is domain independent, relies
entirely on the electronic textbook provided.
Keywords: Ontology Design, Knowledge Acquisition, Pedagogical Representation, Domain Engineering.
--------------------------------------------------------------------***----------------------------------------------------------------------
1. INTRODUCTION
Advancement in technologically developed societies in the
last few years has greatly increased the influence of
Information and Communication Technologies (ICT). On-
line applications have become important; they are
continuously used for communication, consulting bank
accounts, and so on. This revolution has also affected
education, providing means that enhance both teaching and
learning. Electronic textbooks are major components of
technology-based education reform. Research conducted for
years have facilitated the development of different kinds of
TSLSs such as Learning Management Systems (LMSs),
Collaborative Learning Systems, Intelligent Tutoring
Systems (ITS), or Adaptive and Intelligent Web-based
Educational Systems. LMSs such as Moodle or Blackboard
are currently being used at many academic institutions and
are becoming necessary for education [1]. Furthermore, a
good relationship has been noticed between student
involvement and use of Web-based learning technology and
also attractive studying results has been recorded [2]. ITSs
have also proved to improve the achievements of students.
In order to aid learning, TSLSs need, regardless of the
technology or the paradigm, an appropriate domain module.
The domain module is described as the pedagogical
representation of the domain to be learnt. The domain
module is considered as key of any TSLSs as it presents the
information about a subject content to be broadcasted to the
beginner [3].
Domain module permits either the scholar to learn by their
own, in the case of exploratory learning systems or guide
themselves through the process of learning in instructivist
TSL. For example, an ITS relies on the domain module to
determine the tutorial communication content, the selection
of examples, questions and statements, and to assess the
students performance. However, building TSLSs, especially
their domain modules, is expensive and labor intense.
Automatic or semi-automatic construction of the domain
module for TSLSs has been rarely addressed. Anderson
(1988) estimated that in his experience developing ITSs
50% of the effort went on the domain module development.
Chen et al. [4] presented a system for automatically building
ITSs from machine readable representations of textbooks,
and [5] proposed an environment to build ITSs from
spreadsheets. The earlier one requires the instructional
designers to transcribe the textbook to a formal
representation that can be processed, whereas the latter is
limited to the mathematics domain. There have been many
attempts to (semi)-automatically gather domain ontologies
from wide sources (e.g., machine readable dictionaries,
corpora, etc.). TEXCOMON [6] gathers domain ontology
from a set of text-based LOs with the aim of enhancing them
with more knowledge. The authors report several
experiments where the achieved recalls vary from 86.65 to
90.84 percent for classes, 75.28 to 84.33 percent for
taxonomic relationships, and 80.08 to 93.12 percent for

_______________________________________________________________________________________
conceptual relationships. TEXCOMON was also tested
against TEXT-TO-ONTO [7], which obtained 73.06 percent
success rates for classes, 47.53 percent for hierarchical
relationships, and for conceptual relationships 0.31 to 53.03
percent. To develop ontologies for tourism and economy
OntoLearn [8] has been used. The system uses online
corpora, documents, and glossaries as the source for the
ontology learning process, and has generated a 79.3 percent
recall of the relationships for the tourism ontology and 45.5
percent for economy. The terminology recall was 55
percent. Mikel Larranaga et al proposed an ErauzOnt [9]
framework which is used to produce efficient new learning
objects for electronic documents.
Domain module construction is a complex task which
requires not only selecting the domain topics to be mastered,
but also mentioning the pedagogical relationships within the
topics that conclude how to schedule training sessions.
Authors of textbook overcome with identical complications
while scripting their documents, which are formatted to ease
comprehension and learning. Electronic textbooks might be
used as the origin to construct the domain module,
reproducing how normal tutors behave while planning their
subjects: they select reference books set that cater the
important didactic resources (DRs) like definitions,
examples and exercises for the subject, and turn to them for
organizing their lectures. The authors of the textbook have
to structure the document in a means that helps learning, and
provide relevant resources for learning. Therefore, these
kinds of documents can be used to build the domain module.
The proposed system describes semi-automatic generation
of the domain module from electronic textbooks using
ontologies, NLP, heuristic reasoning. Domain module
encrypts knowledge at two different levels, LDO and set of
LOs. Gathering the domain knowledge of a TSLS from
already existing documents might considerably reduce its
development cost. The proposed system is not focused at
constructing extensive domain ontology, but provides aids
to construct didactic ontology. The system proposed is
independent on domain, and focus entirely on the electronic
textbook provided.
The next sections of the paper are classified as follows.
Section 2 explains the process of Domain Module
generation, which entails three main tasks: formulating the
document for extraction of knowledge (Section 3), ontology
construction that describes the domain to be learned
(Section 4) and the LOs generation (Section 5). Section 6
represents the experimental analysis of the results. Finally,
conclusions and also future work are mentioned.
2. CONSTRUCTION OF DOMAIN MODULE
The semi-automatic generation of domain module uses
heuristic reasoning, NLP techniques and artificial
intelligence methods. Here, the domain module encrypts
knowledge at two distinct levels, the LDO and the LOs set.
The following steps are performed to develop the domain
module (see Fig -1):
 Textbook preprocessing. Firstly, the document should
be formulated for the subsequent knowledge gaining
processes. This phase is explained in Section 3, and
the results are then used to gather the two levels of
knowledge encrypted in the domain module.
 LDO gathering. At this stage, the domain topics to be
learned, also the pedagogical relationships within
them are recognized and represented in the LDO. The
acquisition of the LDO is described in Section 4.
 LOs gathering. In LO—definitions, examples,
exercise, etc.—to be used during the process of
learning are recognized and generated. LO’s
gathering is described in Section 5.
Fig-1: Domain module Building Process
In this semi-automatic method, the outcomes of the LDO
and the LOs accumulation can be managed by instructional
designers and teachers both collaboratively and individually
using Elkar-DOM [10], a concept-map-based tool for the
management of the domain module authoring process.
Teachers might, this way, revise the resulting domain
module to their needs or teaching preferences.
3. TEXTBOOK PREPROCESSING
During this stage, system formulates the electronic
document and gathers an organized representation of it, to
perform the process of knowledge acquisition later. As
electronic documents are accessible in various formats, like
rtf, odf or html and text, preprocess is performed initially to
prepare the document. The electronic documents content is
standardized using a hierarchical structure; documents
enclose chapters, which are consecutively divided into
sections, and so on. For a document tree-like internal

_______________________________________________________________________________________
representation is constructed, so that other process can be
performed with no reliance on the format of the initial
document. The gathered internal representations for the
index i.e., outline and the whole document body is then
analyzed linguistically to improve them with the parts of
speech information.
4. GATHERING OF LDO
The ontology learning is described as gathering domain
ontologies from various means in an automatic or semi-
automatic way has been discussed in various works [11].
Much of these programs intent is at constructing or
extending domain ontology or populating lexical ontologies
like Wordnet[12]. Ontology learning mainly combines NLP
techniques and machine learning to build domain ontologies
or to reinforce and populate little base ontologies. Various
resources like text corpora, machine readable dictionaries,
lexical ontologies or document ware houses are usually used
as information sources of for ontology learning.
In this method, the LDO consists of the vital topics of
domain and the pedagogical relationships among them.
Pedagogical relationships may be structural-isA and partOf
or sequential-prerequisite and next. The
relationship states that the X topic is a specific kind of topic
Y. The states that X topic is part of Y topic, i.e. X
is a topic to learn to completely master Y. The
relationship expresses that a X topic must be mastered
before trying to learn Y topic, while states that it is
urged to learn Y topic straight after mastering X topic.
Acquisition of LDO is composed of two main heuristic-
reasoning based steps and NLP: outline analysis that results
in a LDO initial version and the analysis of document body,
that improves the ontology with new topics of domain and
relationships.
4.1 Outline Analysis
Outlines of the document are the primary information
sources for LDO acquisition in a semi-automatic way
because they are generally well structured and enclose the
important domain topics. Analysis of outline is composed of
two main stages:
Basic Analysis: The main topics of domain and the
relationships between these domain topics are recognized
during this phase from the internal representation of the
outline. Here, every outline item is treated as a topic of
domain. Besides, document outline structure is used as an
aid to assemble pedagogical relationships. A subitem of
outline topic is used to describe part of it or a specific case.
As a result, structural relationships are defined among each
outline item and all subitems. Moreover, the outline items
order mirrors the recommended sequence for mastering the
topics of domain. Hence, an initial sequential relationships
set is identified from the outline items order.
Heuristic Analysis: The basic analysis results are filtered
based on a heuristics set that two together classify the
relationships recognized in the earlier step and also mine
new relationships, primarily prerequisite relationships. The
recognized relationships are specified with the implicit
kinds, heuristic applied, and confidence on the inferred text.
The analysis of heuristics is performed in two phases:
firstly, the heuristics for recognizing structural relationships
are applied to evoke isA and partOf relationships. Lastly, the
heuristics for sequential relationships are applied to
recognize prerequisite and next and relationships. The
analysis of heuristics works on the presumption that each
item of outline reflects a distinct domain topic. Although,
there are some frequent items like “introduction”,
“summary”, or “conclusions” that can be found more than
once in the outline. To resolve these issues, a graphical tool
was built to allow the authors of domain module to alter the
index so that it adapts to the presumptions of the heuristic
study.
To perform this analysis a heuristics set should be defined.
A few heuristics are language dependent, since they rely on
linguistic structures that would differ based on the language
they are stated for. This analysis has been enforced on
records in the Basque language. The heuristics set was
recognized at the Basque Country University (UPV/EHU)
on a 150 outline sets of different subjects. Their
investigation permitted the recognization of set of heuristics
and their level of confidence [13].
4.1.1 Structural Relationships Identification
The heuristics for determining structural relationships
permits in recognizing the kind of relationship among an
item of outline and its subitems. The analysis of heuristics
runs under the presumption that single type of structural
relationship can occur within an item of outline and all its
subitems. The analysis work show the most frequent
structural relation is the partOf relationship. Therefore, the
structural relationships are classified as partOf by default.
Moreover, some homogeneous structures were noticed in
the recognized isA relationships. These heuristics examine if
a particular subitem meets a condition. The condition might
also contain the general item. Individual heuristics, i.e.,
heuristics that inspect whether a specific sub item meets a
rule, were defined for recognizing structural relationships in
items of outline with heterogeneous subitems.
Individual Structural Heuristics: These heuristics
examine if an particular subitem meets a condition. The
condition might also contain the general item. The
experimental analysis demonstrated that distinct heuristics
of this type can prompt together in the identical subitems
group. Table-1 shows some fragments of outline in which
the heuristics for recognizing the isA relationships can be
employed.
1. Multiword heuristic (MWH). Multiword terms
incorporate information to ascertain the isA relation.
The Genus et differentiam is the most typical ways to
state new topics, and this motif has been used to
collect taxonomic relationships within topics from

_______________________________________________________________________________________
thesauri, dictionaries or other information sources.
This motif can be seen in several ways:
noun+adjective, noun+noun phrase, and so on. If the
noun that present in these motifs is alike as the
outline item, the more plausible relationship is isA. If
the noun that present in these motifs (algorithm) is
the same as the outline item (algorithms), the isA
relationship is more likely.
2. Entity Name heuristic (ENH). The names of entity
are used to recognize examples of a specific entity.
When the subitems contain names of entity, the
relation among the item and the subitems can be
deliberated as the isA relationship. In Table-1, Red-
black Trees conforms to an entity name, which is a
specific instance of Balanced Trees.
3. Acronyms heuristic (AH). The acronyms are used by
the authors to refer to domain topics which have long
and repeatedly used names. When the subitems
consist of only acronyms, the more likely structural
relation is isA relation. In Table-1, the Pushdown
Stack ADT acronym represents the names of the
some Abstract Data Types. Therefore, among the
item and its subitems there is an predefined isA
relation.
4. Head of the phrase+multiword heuristic
(He+MWH). This heuristic inspect if an outline
subitem arise into a multiword term from the ouline
items head of the phrase. Table-1 shows an example,
as Priority-queue is used to refer to Priority Queues
and Heap Sort in a particular situation, and therefore
the isA relationship can be specified.
5. Possessive Genitive heuristic for indentifying
structural relations (PGH1). Possessive Genitives (in
English-of Preposition) enclose references to other
text. They are used to express just the content parts,
so the analysis of an underlying partOf relation
between the item of outline and its subitems is
assisted by this heuristic.
Table-1: Outline Fragments in Which the Heuristics for the
isA and partOf Relationships Can Be Applied
Heuristics Examples
Multiword(MWH) 2. Principles of Algorithm
Analysis
2.2 Analysis of Algorithms
Entity(ENH) 13. Balanced Trees
13.3 Top down 234 Trees
13.4 Red-black Trees
Acronyms(AH) 4. Abstract Data Type
4.1 Pushdown Stack ADT
Head of the
phrase+Multiword
(He+MWH)
9. Priority Queues and Heap sort
9.5. Priority-Queue ADT
Possessive
genitive(PGH1)
2. Principles of Algorithm
Analysis
2.2. Analysis of Algorithms
2.3. Growth of Functions
2.6. Examples of Algorithm
Analysis
4.1.2 Sequential Relationships Identification
Sequential relationships are of two kinds. The relationship
next states that a subject matter should be mastered just after
other one, appears among items at the identical level, i.e.,
subitems of the same general item. A prerequisite relation
among two topics of domain states that a domain topic
should be mastered before attempting to master the other
topic. The prerequisite relationships can be seen between
any outline items. Therefore, any sequential relation is
labeled as next by default. All the heuristics for sequential
relationships see if a specific outline item and before one
meet a rule to identify a sequential relationship. The below
heuristics are used to recognize prerequisite relationships:
1. Reference heuristic (RH). This heuristic recognizes a
prerequisite relationship when an item of outline
refers to a previous topic, not essentially at the
identical level (refer Table-2).
2. Possessive genitives heuristic for sequential relations
(PGH2). Possessive genitives among items of outline
at the same level of nesting can be used to recognize
prerequisite relationships (refer Table-2).
Table-2: Outline Fragments in Which the Heuristics for the
prerequisite Relationships Can Be Applied
Heuristics Examples
Reference(RH) 1.1. Algorithms
1.3. Union-find algorithms
2. 2. Principles of
algorithm analysis
Possessive
genetives(PGH2)
1.5. Summary of topics
2. Principles of algorithm analysis
2.2. Analysis of Algorithms
2.3. Growth of Functions
2.6. Examples of Algorithm
Analysis
4.2 Analysis of Document Body
During this phase, the LDO is complimented with new
topics and relationships collected from the body of whole
document. To accomplish this goal, two processes are
performed: firstly, new topics are recognized and lastly new
pedagogical relationships within topics are recognized.
The process aims at enhancing the LDO accumulated in the
earlier step with new domain topics. To obtain such new
topics, the document body is examined. Approach use a set
of patterns to get the candidate terms set and then, applies
some term hood measures to rank the candidate terms set
and refine those that are much relevant. In this stage, term
extraction is carried out using Erauzterm[14], which finds
for the most common noun-phrase structures, to collect new
domain topics.
A pattern based approach is used for the identification of
new pedagogical relationships from the electronic
document. These patterns identify pedagogical relationships
among domain topics based on the syntactic structures seen
in the sentences where topics appear. Therefore, the internal
representation of document is annotated first to label some

_______________________________________________________________________________________
domain topic appearance. The semantics for recognizing
pedagogical relationships entails rules for recognizing
structural relationships isA and partOf.
5. GATHERING OF LOs
The LOs generation for the domain topics is attained by
recognizing and collecting DRs, especially consistent
document fragments related to more than one topic with a
specific educational reason. The mining and extraction of
these pieces is performed in an ontology-driven process. At
this stage the LOs like definitions, exercises, examples, and
etc. to be used during the process of learning are recognized
and collected .The process is performed by ErauzOnt [9].
Fig-2 describes the process for gathering the LOs from the
electronic document, which involves the following tasks:
DRs generation from the document, DRs annotation to
become LOs, and, lastly, storing the LOs in a Learning
Objects Repository (LOR) for future use. To construct the
LOs from the assembled DRs, LDO and ALOCOM
ontology [15] are used, and, lastly, the LOs are preserved in
the LOR to ease their reuse.
The LO generation aims to be domain-independent. The
DRs identification is performed by identifying relevant
fragments of text that correspond to definitions like theories,
facts, problem statements and principle statements for the
LDO topics. Firstly, the LDO topics appearances are labeled
in the internal representation of document with the part-of-
speech data. Secondly, the DR grammar is used to find
fragments of text that might involve appropriate resources.
The DR grammar contains a set of rules that describe the
different syntactic structures or patterns. These motifs are
the most usual syntactic structures seen in several topic
definitions. To figure out the confidence on those rules, the
precision of the grammar rules is used. The DRs identified
consists of the sentence that triggered the rule for the
resembling DR and each and every sentence that follow
which refers to the identical topic. Every DR is marked with
the referred domain topics and with the DR rules that
recognized it. The LO annotation process uses this
information later. The searched text fragments are restricted
to domain topics described in the LDO. The gathered DRs
are aimed at being coherent[16] and cohesioned. NLP
techniques that combine a DR grammar and discourse
markers are used, together with a didactic ontology, i.e. an
ontology that describes the different kinds of DRs than can
be used in learning sessions, to achieve this goal.
Once the DRs contained in the textbook have been identified
and gathered, LOs are built from them. After this, the
metadata for each LO is generated to assure that the LOs can
be found and retrieved from the LOR they are stored in. The
LDO and the ALOCOM ontology are used to ensure LO
reusability.
Finally, the LOs generated are stored in the LOR so that
they can be reused either for the Domain module being
developed or any future TSLS.
Fig-2: Generation of LOs from documents
6. RESULTS AND DISCUSSIONS
The system presented throughout this paper, has been tested,
with the intention of validating it, with a textbook provided
by the www.ics.uci.edu. The textbook used, Algorithms in
Java: Parts 1-4, Third Edition by Robert Sedgewick, is
related to computer programming subject. The main goal of
this experiment was to evaluate how the proposed system
helps the teachers to build the Domain Module by
measuring the knowledge, either in the LDO or the LOs,
automatically gathered from the textbook. For the
experiment, only text-based LOs were considered, so an
adapted version of the electronic textbook in which the
images were removed was used. The analyzed textbook has
768 pages. The outline of the document has 16 main items
with internal subitems individually. All documents are in
html format.
To evaluate the process of generation of the Domain
Module, a reference LDO and LO’s set were needed to
compare the obtained results. Therefore, instructional
designers collaborated to manually develop the LDO and
LO’s for the domain topics. The generation of the LDO is
evaluated based on the amount of automatically gathered
knowledge, i.e., domain topics and pedagogical
relationships and the correctness of the proposed topics and
relationships. The details of this evaluation are presented in
Section 6.1. The evaluation of the LO generation considered
the adequacy of the identified LO’s (Section 6.2).
6.1 Evaluation of the Gathered LDO
The LDO Builder is the subsystem responsible for eliciting
the LDO from electronic textbooks. It gathers the contents
of the LDO from both the outline and the whole textbook.

_______________________________________________________________________________________
Nevertheless, at the time this evaluation was conducted, the
LDO only supported the analysis of the outlines of
documents and, therefore, only that feature was tested.
Table-3: Summary of the LDO Relationships in the
Analyzed Outlines
Structural
Relationships
Sequential
Relationships
Total
isA partOf next prerequisite
Real 60 19 272 124 475
Found 41 19 282 118 460
Correct 29 19 245 115 408
The acquisition of the LDO from the document outlines
works under the assumption that every outline item
represents a unique domain topic and relies on the use of
heuristics for the identification of pedagogical relationships.
The automatically gathered pedagogical relationships were
evaluated by comparing them to the pedagogical
relationships defined in the LDOs collaboratively developed
by instructional designers. In order to simplify the
evaluation, the instructional designers were requested to use
only the domain topics referred in the outlines for building
their agreed LDOs. The pedagogical relationships were,
therefore, restricted to relationships among those topics.
Table-3 summarizes the information about the acquisition of
the pedagogical relationships from the textbook outlines.
The LDOs built from the analyzed outlines contained 475
pedagogical relationships (19 partOf, 60 isA, 272 next, and
124 prerequisite). The LDO Builder correctly identified 408
of the 460 found pedagogical relationships (19 of 19 partOf,
29 of 41 isA, 245 of 282 next, and 115 of 118 prerequisite).
Table-4 presents the statistics on the generation of the LDOs
from the analyzed outlines, including the recall, the
precision and the f-measure. Precision is defined as the ratio
of retrieved occurrence that are relevant, while recall is
defined as the ratio of relevant occurrence that are retrieved
and harmonic mean of the precision and recall is expressed
as f-measure. The LDO Builder achieved an overall recall of
85.89%, and 88.7% precision, hence, the f-measure was
87.27%. As can be observed, the results are quite similar for
partOf relationships, while they are a little lower for isA
relationships, which require deeper domain knowledge in
order to be identified. However, the lowest values are
48.33% for recall and 70.73% for precision.
As can be observed in Chart-1, the overall recall and
precision are quite similar for all the relationships, but
significant variation was found in isA relationships. The
LDO ontologies used as reference for the evaluation were
limited to the topic in the outlines. Nevertheless, the
remarkable precision achieved proved that the results are
accurate and opportune means for acquisition of LDO from
electronic textbooks.
Table-4: Statistics of the Automatic Acquisition of the LDO
Relationships from the Outlines
Structural
Relationships
Sequential
Relationships
Total
isA part of next Prerequisite
Precision 70.73 100 86.88 97.46 88.7
Recall 48.33 100 90.07 100 85.89
FMeasure 57.42 100 88.22 98.71 87.27
Chart-1: Statistics of the acquisition of LDO
6.2 Evaluation of LO
LO generation is an ontology-driven process, where it
identifies the fragments of the textbook with educational
purpose related to domain topics. Build LOs from the
identified didactic resources and store the LOs in the
Learning Object Repository. LO acquisition is more difficult
to assess, as a LO might be the most appropriate in a
particular context, while one of its components or a more
complex LO (a composite LO that contains it) might fit
better in other situations. The gathered LOs from electronic
textbooks, was tested with the aim of validating its
performance.
The outline of the textbook used for evaluation was divided
into four parts namely fundamentals, data structures, sorting
and searching. A search was conducted for every input
keyword, where it not only displays the documents matching
the input but also displays the documents containing the
input. The input keyword checks against the identified
pedagogical relationships and retrieves those documents
where input matches with the pedagogical relationships.
Here, first priority is given to the outline of the textbook
then to the document content.

_______________________________________________________________________________________
Table-5: Summary of the LDO Relationships Matching the
Input Keyword
Algorit
hm
Sort DataS
tructu
re
Search Tree Total
Relati
ons
8970 8970 8970 8970 897
0
4485
0
Relati
ons
Found
298 298 298 298 298 1490
Relati
ons
Match
288 221 96 147 128 914
The above table-5 shows the total relationships,
relationships found and matching relationships for different
keywords which are given has input. The keyword algorithm
has the highest number of matching LDO relationships
whereas for data structures it is lowest.
Accuracy is calculated for the each input keyword. Table-6
shows the accuracy results. The keyword algorithm has the
maximum accuracy. Chart-2 shows the graph representing
statistics of the LDO relationships matching the input
keyword.
Table-6: Statistics of the LDO Relationships Matching the
Input Keyword
Algorith
m
Sort Data
Struc
ture
Search Tree Total
Total
Heur
istics
3.32 3.32 3.32 3.32 3.32 3.32
Matc
hed
Heur
istics
3.21 2.46 1.07 1.64 1.43 2.04
Chart-2: Statistics of the LDO Relationships Matching the
Input Keyword
7. CONCLUSION
This paper has presented a system for the semi-automatic
construction of the domain module from electronic
textbooks. The system employs heuristic reasoning, NLP
techniques, and ontologies for the knowledge procurement
processes. The domain module is constructed from
electronic textbook which is provided in distinct forms of
document so that time complexity can be minimized as there
is no requirement of domain specific knowledge and hence
it is independent. The domain module entails the LDO and
LO’s. LDO incorporates main domain topics and
pedagogical relationships within the topics and the LO’s
used to facilitate learning of individual domain topics. The
automatically generated domain module has limited recall
when compared with manually developed domain module.
LDO acquisition resulted in an improved precision of
88.7%. In LO’s, domain topics along with useful resources
based on pedagogical relationships have been retrieved. The
performance is more when compared to that of manual
results. The work can be extended further by the
improvement in the LDO generation. It is designed to
reinforce the semantics for recognizing pedagogical
relationships to increase the recall of the relationships.
Although the system is presently able to process images in
the electronic document by considering their location in the
text, but not where the image is referenced, and hence
useful. Thus, the treatment of images must be enhanced. It is
being reinforced to support multilingual generation of
domain module.
REFERENCES
[1] B. Parsad and L. Lewis, “Distance Education at
Degree-Granting Post Secondary Institutions: 2006-
07,” technical report, Nat’l Center for Education
Statistics, Inst. of Education Sciences, US
Department of Education, 2008.
[2] P.-S.D. Chen, A.D. Lambert, and K.R. Guidry,
“Engaging Online Learners: The Impact of Web-
Based Learning Technology on College Student
Engagement,” Computers and Education, vol. 54, no.
4, pp. 1222-1232, May 2010.
[3] J.R. Anderson, “The Expert Module,” Foundations of
Intelligent Tutoring Systems, M.C. Polson and J.J.
Richardson, eds., pp. 21-54, Lawrence Erlbaum,
1988.
[4] W. Chen, R. Lu, W. Zhang, and H. Du, “A Tool for
Automatic Generation of Multimedia ICAI Systems,”
Proc. Int’l Conf. Artificial Intelligence in Education
(AIED ’97), pp. 571-573, 1997.
[5] M. Lentini, D. Nardi, and A. Simonetta, “Self-
instructive Spread-sheets: An Environment for
Automatic Knowledge Acquisition and Tutor
Generation,” Int’l J. Human-Computer Studies, vol.
52, no. 5, pp. 775-803, 2000.
[6] A. Zouaq and R. Nkambou, “Evaluating the
Generation of Domain Ontologies in the Knowledge
Puzzle Project,” IEEE Trans. Knowledge and Data
Eng., vol. 21, no. 11, pp. 1559-1572, Nov. 2009.

_______________________________________________________________________________________
[7] A. Maedche and S. Staab, “Ontology Learning for
the Semantic Web,” IEEE Intelligent Systems, vol.
16, no. 2, pp. 72-79, Mar. 2001.
[8] P. Velardi, R. Navigli, A. Cucchiarello, and F. Neri,
“Evaluation of OntoLearn, a Methodology for
Automatic Learning of Domain Ontologies,”
Ontology Learning from Text: Methods,
Applications, and Evaluation, P. Buitelaar, P.
Cimiano, and B. Magnini, eds., pp. 92-106, IOS
Press, 2005.
[9] I. Alegria, A. Gurrutxaga, P. Lizaso, X. Saralegi, S.
Ugartetxea, and Urizar, “An XML-Based Term
Extraction Tool for Basque,” Proc. Fifth Int’l Conf.
Language Resources and Evaluations (LREC ’04),
2004.
[10] M. Larranaga, I. Niebla, U. Ruedat, J.A. Elorriaga,
and A. Arruarte, “Towards Collaborative Domain
Module Authoring,” Proc. Seventh IEEE Int’l Conf.
Advanced Learning Technologies, pp. 814-818, July
2007.
[11] Semi-Automatic Ontology Development: Processes
and Resources, M.T. Pazienza and A. Stellato, eds.,
IGI Global, 2012.
[12] WordNet: An Electronic Lexical Database, C.
Fellbaum, ed., MIT Press, 1998.
[13] M. Larranaga, U. Rueda, J.A. Elorriaga, and A.
Arruarte, “Acquisition of the Domain Structure from
Document Indexes Using Heuristic Reasoning,”
Proc. Seventh Int’l Conf. Intelligent Tutoring
Systems (ITS ’04), pp. 175-186, 2004.
[14] I. Alegria, A. Gurrutxaga, P. Lizaso, X. Saralegi, S.
Ugartetxea, and Urizar, “An XML-Based Term
Extraction Tool for Basque,” Proc. Fifth Int’l Conf.
Language Resources and Evaluations (LREC ’04),
2004.
[15] M. Larranaga, I. Calvo, J.A. Elorriaga, A. Arruarte,
K. Verbert, and Duval, “ErauzOnt: A Framework for
Gathering Learning Objects from Electronic
Documents,” Proc. 11th IEEE Int’l Conf. Advanced
Learning Technologies (ICALT ’11), pp. 656-658,
2011.
[16] K. Verbert, D. Gasevic, J. Jovanovic, and E. Duval,
“Ontology-Based Learning Content Repurposing,”
Proc. 14th Int’l Conf. World Wide Web (WWW ’05),
pp. 1140-1141, 2005.
[17] M. Larranga, A. Conde, Inali Calvo, Jon A.
Elloriage, Ana Arruarte, “Automatic Generation of
Domain Module from Electronic Textbooks: Method
and Validation”, IEEE Transactions on Knowledge
and Data Engineering, Vol.26, No.1, pp. 69-82,
January 2014.
BIOGRAPHIES
P.Sowjanya Lakshmi is pursuing her
M.Tech in Computer Science
Engineering (CSE) from
G.Narayanamma Institute of Technology
and Science, Hyderabad, and completed
her B.Tech in Information Technology
(IT) from the same college in the year
2013. Areas of research include data
mining.
Dr. M.Seetha had completed Ph.D in
Computer Science and Engineering in the
area of image processing in December
2007 from Jawaharlal Nehru
Technological University, Hyderabad,
India. She is presently working as
Professor in Department of CSE in
GNITS, Hyderabad and has the teaching experience of 20
years. She is guiding 10 Ph.D scholars and her research
interest includes image processing, neural networks,
computer networks and data mining. She had published
more than 70 papers in refereed journals and in the
proceedings of National/International Conferences and
Symposiums.

An effective method for semi automatic construction of domain module from electronic textbook

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (19)

Similar to An effective method for semi automatic construction of domain module from electronic textbook

Similar to An effective method for semi automatic construction of domain module from electronic textbook (20)

More from eSAT Journals

More from eSAT Journals (20)

Recently uploaded

Recently uploaded (20)

An effective method for semi automatic construction of domain module from electronic textbook