SlideShare a Scribd company logo
LiDom Builder: Automatising the Construction of
Multilingual Domain Modules
Ángel Conde Manjón
GaLan Research Group – LSI Department
University of the Basque Country (UPV/EHU)
Supervisors:
Dr. Mikel Larrañaga Olagaray & Dr. Ana Arruarte Lasa
UPV/EHU
25 February 2016
• Technology Supported Learning Systems (TSLS)
• Learning Management Systems:
• Massive Open Online Courses:
• Intelligent Tutoring Systems: SQL-Tutor
• …
• Bilingual and Multilingual Contexts are a reality (Unesco, 2003)
• Acquiring the Domain Module is a cost and work intensive
task
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Context
2
3
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Main Goal
Automatising the construction of MULTILINGUAL DOMAIN MODULES
4
DOM-Sortze (Larrañaga, 2012) a framework for building DOMAIN MODULES from
electronic textbooks
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Previous Work: DOM-Sortze
5
Electronic Textbook
LDO Gathering
Preprocess
LOs Gathering
Domain Module
Document Body Internal
Representation
Document Outline Internal
Representation
Learning Domain Ontology
1
2
3
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Previous Work: DOM-Sortze
6
Planetary
System
Solar System
Moon
Satellite
Planet Earth
partOfpartOf
partOf
isA
isA
prerequisite
The Moon is Earth's
only natural satellite
LO1
hasDR
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
DOM-Sortze: Domain Module Representation Formalism
Learning Domain Ontology (LDO)
Topics and pedagogical relationships
Learning Objects (LO)
• Definitions
• Examples
• Problem Statements
• …
Limitations of DOM-Sortze:
1. Developed for a single language: Basque.
2. Its formalism is not able to represent Multilingual Domain
Modules.
7
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
DOM-Sortze: Limitations
8
1. Can be the formalism used in DOM-Sortze be enhanced for
Multilingual Domain Modules?
– Extend the formalism to deal with Multilingual Domain Modules.
2. Which enhancements are required to deal with various languages?
– Develop a method for extracting Multilingual Terminology.
– Improve the Relationship Acquisition.
– Provide a method for acquiring Multilingual Learning Objects.
Automatising the construction of MULTILINGUAL DOMAIN MODULES
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Goals
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
9
I. Introduction: Motivations and Goals
II. LiDom Builder: Building Multilingual Domain
Modules
III. Acquisition of Multilingual Terminology
IV. Identification of Pedagogical Relationships
V. Gathering Multilingual Learning Objects
VI. Conclusions and Future Work
Outline
10
I. Introduction: Motivations and Goals
II. LiDom Builder: Building Multilingual Domain
Modules
III. Acquisition of Multilingual Terminology
IV. Identification of Pedagogical Relationships
V. Gathering Multilingual Learning Objects
VI. Conclusions and Future Work
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future WorkLiDomBuilder
Outline
11
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future WorkLiDomBuilder
Multilingual
Terminology
Extraction
Pedagogical
Relationship
Extraction
Textbook
Multilingual
Learning Object
Generation
LiDom Builder
Overview
LiDom Builder: framework for automatising the
acquisition of Multilingual Domain Modules
Domain Module
Equiv. “en”
Equiv. “es”
12
Planetary
System
Solar System
Moon
Satellite
Planet Earth
partOfpartOf partOf
isA
isA
prerequisite
pedagogically
Close
“ilargi”
“luna”
“moon”
LO1 LO2
eu
en
es
hasDR hasDR
@
@ @
@
@
@
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future WorkLiDom Builder
Multilingual Domain Module Formalism
Language
Identification
LDO
Gathering
13
Electronic Textbook
Preprocess
LOs Gathering
Document Internal
Representation
Document Outline Internal
Representation
1
2
3
Domain Module
Learning Domain Ontology
NLP Parsers
Illinois Chunker
Illinois POS tagger
FreeLing
IXA-Pipes
Topic Extraction
Relationship Extraction
Set of Heuristics
Grammar
Multilingual LOs
Grammar
Discourse Markers
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future WorkLiDom Builder
Proposed Enhancements
LiTeWi
LiReWi
LiLoWi
0
12
Electronic Textbook
LDO
Gathering
Preprocess
LOs Gathering
Document Internal
Representation
Document Outline Internal
Representation
1
2
3
Domain Module
Learning Domain Ontology
Knowledge Resources
…..
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future WorkLiDom Builder
Proposed Enhancements
15
• Two phases
• Tuning up
• Set the thresholds and default confidence values.
• Evaluation
• Gold Standard (Recall, Precision, F1-Score).
• Expert validation.
• Use of three textbooks
1. Programming: Introduction to Object Oriented Programming (Wong .S,
2010).
2. Astronomy: Introduction to Astronomy (Morison, 2008).
3. Biology: Introduction to Molecular Biology (Raineri,2010).
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future WorkLiDom Builder
General Evaluation Methodology
16
I. Introduction: Motivation and Goals
II. LiDom Builder: Building Multilingual Domain
Modules
III. Acquisition of Multilingual Terminology
IV. Identification of Pedagogical Relationships
V. Gathering Multilingual Learning Objects
VI. Conclusions and Future Work
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Outline
17
In DOM-Sortze, terminology extracted with ErauzTerm (Alegria et al., 2004).
A new tool called LiTeWi has been developed.
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Acquisition of Multilingual Terminology
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
LiTeWi
18
TF-IDF KP-Miner CValue
Shallow Parsing
Grammar
Electronic Textbook
Candidate Extraction
Generic
Corpus
Mapping
Disambiguation
Filtering
Mapping to other languages
Candidate Selection
Combination
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Shallow Parsing Algorithm
19
• Uses a derived grammar from (Larrañaga, 2012).
Constraint
Grammar applied
to POS tags
Shallow Parser
Topics
Array List
Stack
………
Grammar
Topic + [*]+ part of + [det] +Topic
……………….
Textbook
Sentences may contain topics
This is called an Array List
A Stack is used to model systems that exhibit LIFO…
Extraction
Rules
Chunks
an Array List
A Stack
…….
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
LiTeWi
TF-IDF KP-Miner CValue
Shallow Parsing
Grammar
Electronic Textbook
Candidate Extraction
Mapping
Disambiguation
Filtering
Mapping to other languages
Generic
Corpus
Candidate Selection
Combination
20
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Mapping
21
• Terms mapped to their corresponding Wikipedia articles.
• Search procedure to match Wikipedia article titles and their labels.
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
LiTeWi
TF-IDF KP-Miner CValue
Shallow Parsing
Grammar
Electronic Textbook
Candidate Extraction
Mapping
Disambiguation
Filtering
Mapping to other languages
Generic
Corpus
Candidate Selection
Combination
22
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Disambiguation
23
• Method based on global disambiguation (Milne et al., 2008).
• Domain knowledge step added to improve the results.
• Use as a disambiguation context the domain important terms.
• Gold Term List: Domain important terms with only one sense.
Monosemic terms that have highest CValue score.
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Disambiguation
24
Wikiminer
Compare Service
Term List (to disambiguate)
-Java
- Inheritance
-Property
Disambiguated Term -Java (programming Language)
Gold Term List
-Class
-Programming Language
-Array List
Class Prog.
Lang.
Array List
Prog. Language 0.90 0.85 0.64
Island 0.7 0.77 0.53
City 0.56 0.75 0.6
Average
0.89
0.70
0.63
-Java
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
LiTeWi
25
TF-IDF KP-Miner CValue
Shallow Parsing
Grammar
Electronic Textbook
Candidate Extraction
Mapping
Disambiguation
Filtering
Mapping to other languages
Generic
Corpus
Candidate Selection
Combination
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Filtering Unwanted Terms
26
Wikiminer
Compare Service
Number of Related Gold
Terms
Gold Term List
-Solar System
- Black Hole
-Solar Mass
Term List (to filter)
-Universal Studios
-Planet
-Windows 98
Relatedness Score
-Planet
-Windows 98
Domain Related Term
-Planet
-Planet
N(>1)
Threshold(>=0.6)
Solar System (0.34)
Black Hole (0.53)
Solar Mass (0.47)
Solar System (0.23)
Black Hole (0.68)
Solar Mass (0.50)
-Universal Studios
-Windows 98
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
LiTeWi
27
TF-IDF KP-Miner CValue
Shallow Parsing
Grammar
Electronic Textbook
Candidate Extraction
Mapping
Disambiguation
Filtering
Mapping to other languages
Generic
Corpus
Candidate Selection
Topic EN ES EU
Moon Moon Luna Ilargia
Combination
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Evaluation
28
Tuning up
• Introduction to Object Oriented Programming textbook.
Evaluation
• Gold Standard and Expert Validation.
• Gold Standard based on the terms appearing on the index of each textbook.
• Evaluated on Introduction to Astronomy and Introduction to Molecular
Biology.
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Results
29
Gold-Standard Ex. Validation
Precision (%) Recall (%) F1 Score (%) Correctness (%)
Astronomy 3.55 62.96 6.72 18.55
Mol. Biology 2.24 10.21 3.67 49.27
Gold-Standard Ex. Validation
Precision (%) Recall (%) F1 Score (%) Correctness (%)
Astronomy 17.96 72.55 28.79 78.77
Mol. Biology 27.09 50.53 87.70 71.65
• Wikifier (Cheng , 2013)
• LiTeWi
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Outline
30
I. Introduction: Motivation and Goals
II. LiDom Builder: Building Multilingual Domain
Modules
III. Acquisition of Multilingual Terminology
IV. Identification of Pedagogical Relationships
V. Gathering Multilingual Learning Objects
VI. Conclusions and Future Work
Introduction
31
In DOM-Sortze, relationship acquisition for Basque using Shallow Parsing
An adaptation and extension of the Heuristic-based analysis of
the outline has been developed.
A new tool called LiReWi has been developed.
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Heuristic-based analysis of the outline
32
Document Outlines
• Reflects the organization made by the author.
• The structure of the outline underlies pedagogical relationships.
• Low cost process (summarised).
DOM-Sortze
• Each outline item is considered as a domain topic.
• By default gathers a partOf relation between an item and its subitems.
• Heuristics to detect isA relations.
LiDom Builder
• Adaptation to English of heuristics from (Larrañaga et al., 2004).
• Improvement of isA identification using Wikitaxonomy (Ponzetto et al., 2007).
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Wikipedia Enhanced Process
33
………..
4.- Structure of polymers / Macromolecules
4.1.- Polymer chemistry
4.2.- Molecular weight
4.3.- Form, structure and molecular configuration
4.3.- Supramolecular arrangement
4.4.- Crystalline and amorphous polymers
4.5.- Families of polymeric materials
4.5.1.- Thermosettings
4.5.2.- Thermoplastics
4.5.3.- Elastomers
5.- Phase diagrams / Definitions
5.1.- Solid solutions
5.2.- Phases rule of Gibbs
5.3.- Types of phase diagram
1. Identify groups of sibling nodes
2. Select the groups of leaf nodes in which
the partOf relationship has been
identified
Thermosettings polymer (Article id= 321827)
Thermoplastic (Article id= 182444)
Elastomer (Article id = 842224)
3. Link and disambiguate each
node to a Wikipedia article
using Wikiminer (Milne et al.,
2012)
Materials science
Elastomers
Polymer physics
Polymer physics
Polymer chemistry
4. Process every group using
(Ponzetto et al., 2007) taxonomy
5. Infer isA relationship in those
groups that share a common
ancestor
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Evaluation
34
Gold Standard
• 57 document outlines in English from different
domains.
• Human instructors defined the optimal output (LDOs).
• Each LDO restricted to the topics of the outline.
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Results
35
• Heuristic Analysis
• Heuristic Analysis + Wikipedia Enhanced Process
partOf isA Total
Precision (%) 84.12 78.95 83.85
Recall (%) 98.66 21.20 83.85
partOf isA Total
Precision (%) 89.19 77.30 87.70
Recall (%) 96.49 50.53 87.70
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Identification of Pedagogical Relationships: LiReWi
36
Mapping
Topics
Knowledge Bases
LiReWiElectronic
Textbook
Candidate
Relationship
Extraction
Combination &
Filtering
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Mapping
37
Topic: Syntax
Wikipedia id=3206060
WordNet id=?
Comparer
Page Rank
Disambiguation
Syntax
WordNet id= 6176322
Syntax
WordNet id= 8436203
Final id
Mapped WordNet id
returned=
WordNet id =
6176322
! =
Fernando’s Mappings
Babelnet Mappings
Wiki Id WordNet id
3206060 8436203,…
………. ………..
……… …………
Wiki Id WordNet id
3206060 6176322,…
………. ………..
……… …………
Mapping To
WordNet
Disambiguation
Disambiguation Context
WordNet id
8436203
6176322
……….
Java, Programming….
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Identification of Pedagogical Relationships: LiReWi
38
Mapping
Candidate
Relationship
Extraction
Topics
Knowledge Bases
LiReWiElectronic
Textbook
Combination &
Filtering
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Candidate Relationship Extraction
39
WordNet
Extractor
Wibi
Extractor
WikiRelations
Extractor
Shallow Parsing
Grammar
Extractor
Sequential
Extractor
NLP data
WikiTaxonomy
Extractor
isA
partOf
prerequisite
prerequisite
pedagogically-
Close
isA
partOf
isAisA isA
partOf
Candidate Relationships
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Candidate Relationship Extraction
40
Path Based Extractors:
Rocky planet
Mars
Planet
(path length=2,
confidence=0.9)(path length=1,
confidence=1)
isA
isA
WordNet
Extractor
Wibi
Extractor
WikiRelations
Extractor
Shallow Parsing
Grammar
Extractor
Sequential
Extractor
WikiTaxonomy
Extractor
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Candidate Relationship Extraction
41
• WikiRelations: Set of tuples that state the relationships between Wikipedia
categories.
T Tauri, Star, isA
…………
Radiation, Radio waves, partOf
Light, Electromagnetic radiation, partOf
…………
Light, Electromagnetic radiation, partOf
…………
T Tauri star, Star, isA
007 license to kill, video games, isA
WikiRelations Tuples
Light partOf
Electromagnetic radiation
(Confidence=0.7)
Topic: Light
Cat1: Light
Cat2: …
Topic: Electromagnetic radiation
Cat1: Electromagnetic radiation
Topic: ……
WordNet
Extractor
Wibi
Extractor
WikiRelations
Extractor
Shallow Parsing
Grammar
Extractor
Sequential
Extractor
WikiTaxonomy
Extractor
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Sentences with mentions
Earth is part of the Solar System.
……………….
Candidate Relationship Extraction
42
• Extractor based on the rules defined in (Larrañaga, 2012).
Topics
Solar System
Earth
Planet
Mars
Find Mentions
Constraint Grammar
applied to POS tags
Relationships
Earth partOf Solar System
……………….
…………
Grammar
Topic + [*]+ part of + [det] +Topic
……………….
Textbook
WordNet
Extractor
Wibi
Extractor
WikiRelations
Extractor
Shallow Parsing
Grammar
Extractor
Sequential
Extractor
WikiTaxonomy
Extractor
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
WordNet
Extractor
Wibi
Extractor
WikiRelations
Extractor
Shallow Parsing
Grammar
Extractor
Sequential
Extractor
WikiTaxonomy
Extractor
Candidate Relationship Extraction
43
Textbook
Topics
Wavelength
Emission spectrum
Planet
Solar System
Find
Mentions
Look links
in/links out on
Wikipedia
Reasoner
Relations
Emission spectrum
pedagogicallyClose Wavelength
…………………….
Possible candidates:
Wavelength, Emission Spectrum
(2 times)
Sentences with mentions
...leading to different radiated wavelengths,
make up an emission spectrum.
... the emission spectrum of a particular
star, the wavelength of …
……………..
Relatedness > threshold
Emission spectrum (link out) Wavelength
Wavelength (link out) Emission spectrum
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Candidate Relationship Extraction
44
Topic1 Topic2 Topic3 Topic4
Topic1 is pedagogicallyClose to Topic2 Topic3 is a prerequisite of Topic4
4
3
4
1
Mentions (Links):
-Topic3, 4 mentions
-….
Mentions (Links):
-Topic4, 1 mentions
-….
Mentions (Links):
-Topic2, 3 mentions
-….
Mentions (Links):
-Topic1, 4 mentions
-….
WordNet
Extractor
Wibi
Extractor
WikiRelations
Extractor
Shallow Parsing
Grammar
Extractor
Sequential
Extractor
WikiTaxonomy
Extractor
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Identification of Pedagogical Relationships: LiReWi
45
Mapping
Candidate
Relationship
Extraction
Combination &
Filtering
Learning Domain
Ontology
Topics
Knowledge Bases
LiReWiElectronic
Textbook
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Combination & Filtering Relationships
46
-Earth isA Planet (WordNet Ex) (Conf=1)
-Earth isA Planet (WikiRelations Ex) (Conf=0.8)
-Planet isA Earth (WikiTax Ex) (Conf=0.7)
-Earth partOf Solar System (WordNet Ex) (Conf=1)
-Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)
-Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1)
-Earth partOf Solar System (WordNet Ex) (Conf=1)
Relationships
-Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1)
-Planet isA Earth (WikiTax Ex) (Conf=0.7)
-Earth partOf Solar System (WordNet Ex) (Conf=1)
-Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1)
-Earth partOf Solar System (WordNet Ex) (Conf=1)
-Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)
Confidence
Combiner
Conflict
Resolver
Filter
Final Relationships
Conflict
Resolution
Relationships combined
Filter below
threshold
-Planet isA Earth (WikiTax Ex) (Conf=0.7)
-Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Evaluation
47
Tuning up
• Introduction to Object Oriented Programming textbook.
Evaluation
• Gold Standard and Expert Validation.
• Introduction to Astronomy textbook.
• Gold standard, four experts stated the set of relationships.
• Using a subset of the main domain topics according to the score given by LiTeWi.
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Results
48
Precision (%) Recall (%) F1-Score (%) Expert
Validation (%)
LiReWi 36.21 50.57 42.42 43.98
DOM-Sortze 63.27 20.74 31.24 N.A.
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Multilingual
Learning Objects
Conclusions and
Future Work
LiDom Builder
Outline
49
I. Introduction: Motivations and Goals
II. LiDom Builder: Building Multilingual Domain
Modules
III. Acquisition of Multilingual Terminology
IV. Identification of Pedagogical Relationships
V. Gathering Multilingual Learning Objects
VI. Conclusions and Future Work
Gathering Multilingual
Learning Objects
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Conclusions and
Future Work
LiDom Builder
Introduction
50
In DOM-Sortze, LOs acquisition for Basque using Shallow Parsing.
A Validation of the approach for English has been carried out.
LiLoWi has been developed to move towards the elicitation of
Multilingual LOs.
Gathering Multilingual
Learning Objects
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Conclusions and
Future Work
LiDom Builder
Adapting Learning Object elicitation to English
51
Basque English
Pattern adibidez, @topic for instance, @topic
Example
Uretan, adibidez hidrogeno eta oxigeno
atomoak daude.
For instance, there are hydrogen
and oxygen atoms in water.
Textbook
Topics
Wavelength
Emission spectrum
Earth.
Solar System Find
Mentions
Grammar
Sentences with mentions
Earth is a planet.
……………….
Learning Objects
The Moon is Earth's
only natural satellite
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Evaluation
52
Gold Standard and Expert Validation:
• Evaluated on Introduction to Object Oriented Programming.
• Gold Standard built by some experts.
Two Aspects
• Grammar.
• Learning Objects.
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Evaluation
53
Definitions Examples Prob. Stat. Princ. Stat. Total
Found 164 1 12 49 226
Correct 138 1 7 35 181
Precision (%) 84.15 100 58.33 71.43 80.09
Recall (%) Expert
Validation (%)
DOM-Sortze 70.31 91.88
LiDom 75.93 86.79
• Grammar
• Learning Objects
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
LiLoWi
54
Metadata
Generator
Multilingual LOs
from WordNet/Wikipedia
Topics
Solar System
Emission spectrum
Earth.
LO2es
LO1en
LO2en
Equivalents
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
• Evaluated on the Principles of Object-Oriented Programming.
• Used the same LDO described in the previous experiment.
• Expert Validation.
Two Aspects
 How LiLoWi enhanced the LO coverage for the LDO topics.
 How many multilingual LOs are extracted.
Evaluation
55
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Results
56
Definitions References
English Spanish Basque French
Number of topics
Topic coverage (%)
46
56.10
36
43.90
9
10.97
36
43.90
12
14.63
• Grammar + Wikipedia/WordNet
Total Definitions
Number of topics 21 19
Topics coverage (%) 25.61 19.51
• Grammar-based approach
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and Future
Work
LiDom Builder
I. Introduction: Motivation and Goals
II. LiDom Builder: Building Multilingual Domain
Modules
III. Acquisition of Multilingual Terminology
IV. Identification of Pedagogical Relationships
V. Gathering Multilingual Learning Objects
VI. Conclusions and Future Work
Outline
57
58
1. Provision of a suitable formalism to represent Multilingual Domain Modules.
2. Developed a method for the elicitation of multilingual terminology.
– First term extractor to our knowledge based on searching patterns for
educational content.
3. Relationship Acquisition has been improved.
– Extension of outline processor to English + Enhancement with Wikipedia.
– Development of LiReWi, a module for the elicitation of pedagogical
relationships for Educational Ontologies.
– Developed a state of the art mapper from Wikipedia to WordNet.
4. Developed a method for multilingual LO generation.
– Extension of DOM-Sortze for English.
– Development of LiLoWi, a module for the elicitation of multilingual LOs using
different knowledge bases.
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and Future
Work
LiDom Builder
Goal Achievement
Conclusions and Future
Work
• Automatising the inclusion of new languages.
• Multilingual Learning Object generation from similarity and machine
translation techniques.
• Concept Map-Based Learning Object Generation.
• Improvements on each module of LiDom Builder.
59
Future Work
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
LiDom Builder
Conclusions and Future
Work
Software Released
60
Software
• LiTeWi, released with Spanish/English support: https://github.com/Neuw84/LiTe
• Wikipedia/WordNet mapper: https://github.com/Neuw84/Wikipedia2WordNet
• Spanish stemmer: https://github.com/Neuw84/SpanishInflectorStemmer
• Training Data for Wikiminer: https://github.com/Neuw84/Wikipedia353Spanish
• LiReWi: coming soon….
Web Demo
• LiDom builder : http://galan.ehu.es/lidom/
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
LiDom Builder
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and Future
Work
LiDom Builder
61
Publications
A Combined Approach for Eliciting Relationships for Educational Ontologies Using Several
Knowledge Bases.
Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga.
Journal of Knowledge-Based Systems. Submitted.
LiteWi: A Combined Term Extraction Method for Eliciting Educational Ontologies from Textbooks.
Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga, Dan Roth.
Journal of the Association for Information Science and Technology, 67(2), pp. 380–399, 2016.
Testing Language Independence in the Semiautomatic Construction of Educational Ontologies.
Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga.
12th International Conference on Intelligent Tutoring Systems ITS 2014, Springer, Vol. 8474, pp.
545-550, 2014.
Automatic Generation of the Domain Module from Electronic Textbooks. Method and Validation.
Mikel Larrañaga, Ángel Conde, Iñaki Calvo, Jon A. Elorriaga, Ana Arruarte
IEEE Transactions on Knowledge and Data Engineering, 26(1), pp. 69-82, 2014.
Automating the Authoring of Learning Material in Computer Engineering Education.
Ángel Conde, Mikel Larrañaga, Iñaki Calvo, Jon A. Elorriaga, Ana Arruarte.
42nd Frontiers in Education Conference, pp. 1376-1381, 2012.
LiDom Builder: Automatising the Construction of Multilingual Domain
Ángel Conde Manjón
GaLan Research Group – LSI department, University of the
Basque Country (UPV/EHU)
Supervisors:
Mikel Larrañaga Olagaray & Ana Arruarte Lasa
UPV/EHU

More Related Content

What's hot

TeCoLa project: Pedagogical differentiation through telecollaboration and gam...
TeCoLa project: Pedagogical differentiation through telecollaboration and gam...TeCoLa project: Pedagogical differentiation through telecollaboration and gam...
TeCoLa project: Pedagogical differentiation through telecollaboration and gam...
Kristi Jauregi Ondarra
 
Catalan Model for Language Learning in Plurilingual contexts
Catalan Model for Language Learning in Plurilingual contextsCatalan Model for Language Learning in Plurilingual contexts
Catalan Model for Language Learning in Plurilingual contexts
Neus Lorenzo
 
LibreItalia: growing up on bread and free software
LibreItalia: growing up on bread and free softwareLibreItalia: growing up on bread and free software
LibreItalia: growing up on bread and free software
LibreItalia
 

What's hot (6)

TeCoLa project: Pedagogical differentiation through telecollaboration and gam...
TeCoLa project: Pedagogical differentiation through telecollaboration and gam...TeCoLa project: Pedagogical differentiation through telecollaboration and gam...
TeCoLa project: Pedagogical differentiation through telecollaboration and gam...
 
Catalan Model for Language Learning in Plurilingual contexts
Catalan Model for Language Learning in Plurilingual contextsCatalan Model for Language Learning in Plurilingual contexts
Catalan Model for Language Learning in Plurilingual contexts
 
Zijlmans et al-2016
Zijlmans et al-2016Zijlmans et al-2016
Zijlmans et al-2016
 
LibreItalia: growing up on bread and free software
LibreItalia: growing up on bread and free softwareLibreItalia: growing up on bread and free software
LibreItalia: growing up on bread and free software
 
Arnold prato cirn12
Arnold prato cirn12Arnold prato cirn12
Arnold prato cirn12
 
Status Quo of
Status Quo of Status Quo of
Status Quo of
 

Viewers also liked

Modern Software Development
Modern Software DevelopmentModern Software Development
Modern Software Development
Angel Conde Manjon
 
Modern Java Development
Modern Java DevelopmentModern Java Development
Modern Java Development
Angel Conde Manjon
 
Automated Classification and Quantification of Verbatims via Machine...
         Automated Classification and Quantification of Verbatims via Machine...         Automated Classification and Quantification of Verbatims via Machine...
Automated Classification and Quantification of Verbatims via Machine...
Fabrizio Sebastiani
 
Machine Learning and Automatic Text Classification: What's Next?
Machine Learning and Automatic Text Classification: What's Next?Machine Learning and Automatic Text Classification: What's Next?
Machine Learning and Automatic Text Classification: What's Next?
Fabrizio Sebastiani
 
Text Quantification
Text QuantificationText Quantification
Text Quantification
Fabrizio Sebastiani
 
Teaser Storyboard
Teaser StoryboardTeaser Storyboard
Teaser Storyboard
Ricky Fang
 
Large variance and fat tail of damage by natural disaster
Large variance and fat tail of damage by natural disasterLarge variance and fat tail of damage by natural disaster
Large variance and fat tail of damage by natural disaster
Hang-Hyun Jo
 
December 2%2c 2015 - A6
December 2%2c 2015 - A6December 2%2c 2015 - A6
December 2%2c 2015 - A6Emily Robin
 
Indian Textile Industry Outlook_Arpit Nagda
Indian Textile Industry Outlook_Arpit NagdaIndian Textile Industry Outlook_Arpit Nagda
Indian Textile Industry Outlook_Arpit NagdaArpeit Nagda
 
Exitinterview 101108133541-phpapp02
Exitinterview 101108133541-phpapp02Exitinterview 101108133541-phpapp02
Exitinterview 101108133541-phpapp02
Arvind Kumar
 
Utility Theory, Minimum Effort, and Predictive Coding
Utility Theory, Minimum Effort, and Predictive CodingUtility Theory, Minimum Effort, and Predictive Coding
Utility Theory, Minimum Effort, and Predictive Coding
Fabrizio Sebastiani
 
How to set up Gmail to Send and Receive Emails from your Web Hosting Email
How to set up Gmail to Send and Receive Emails from your Web Hosting EmailHow to set up Gmail to Send and Receive Emails from your Web Hosting Email
How to set up Gmail to Send and Receive Emails from your Web Hosting Email
Natasha Rivera
 
Text Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningText Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion Mining
Fabrizio Sebastiani
 
Child protection policy
Child protection policyChild protection policy
Child protection policy
Ayie Paghangaan
 

Viewers also liked (15)

Modern Software Development
Modern Software DevelopmentModern Software Development
Modern Software Development
 
Modern Java Development
Modern Java DevelopmentModern Java Development
Modern Java Development
 
Automated Classification and Quantification of Verbatims via Machine...
         Automated Classification and Quantification of Verbatims via Machine...         Automated Classification and Quantification of Verbatims via Machine...
Automated Classification and Quantification of Verbatims via Machine...
 
Machine Learning and Automatic Text Classification: What's Next?
Machine Learning and Automatic Text Classification: What's Next?Machine Learning and Automatic Text Classification: What's Next?
Machine Learning and Automatic Text Classification: What's Next?
 
Text Quantification
Text QuantificationText Quantification
Text Quantification
 
Teaser Storyboard
Teaser StoryboardTeaser Storyboard
Teaser Storyboard
 
Large variance and fat tail of damage by natural disaster
Large variance and fat tail of damage by natural disasterLarge variance and fat tail of damage by natural disaster
Large variance and fat tail of damage by natural disaster
 
CetakMakalah
CetakMakalahCetakMakalah
CetakMakalah
 
December 2%2c 2015 - A6
December 2%2c 2015 - A6December 2%2c 2015 - A6
December 2%2c 2015 - A6
 
Indian Textile Industry Outlook_Arpit Nagda
Indian Textile Industry Outlook_Arpit NagdaIndian Textile Industry Outlook_Arpit Nagda
Indian Textile Industry Outlook_Arpit Nagda
 
Exitinterview 101108133541-phpapp02
Exitinterview 101108133541-phpapp02Exitinterview 101108133541-phpapp02
Exitinterview 101108133541-phpapp02
 
Utility Theory, Minimum Effort, and Predictive Coding
Utility Theory, Minimum Effort, and Predictive CodingUtility Theory, Minimum Effort, and Predictive Coding
Utility Theory, Minimum Effort, and Predictive Coding
 
How to set up Gmail to Send and Receive Emails from your Web Hosting Email
How to set up Gmail to Send and Receive Emails from your Web Hosting EmailHow to set up Gmail to Send and Receive Emails from your Web Hosting Email
How to set up Gmail to Send and Receive Emails from your Web Hosting Email
 
Text Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningText Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion Mining
 
Child protection policy
Child protection policyChild protection policy
Child protection policy
 

Similar to Ph.D. Defense

Visual data-enriched design technology for blended learning
Visual data-enriched design technology for blended learningVisual data-enriched design technology for blended learning
Visual data-enriched design technology for blended learning
Laia Albó
 
IWMW 2002: Interoperability and learning standards briefing, Introduction
IWMW 2002: Interoperability and learning standards briefing, IntroductionIWMW 2002: Interoperability and learning standards briefing, Introduction
IWMW 2002: Interoperability and learning standards briefing, Introduction
IWMW
 
The Structure and Components for the Open Education Ecosystem
The Structure and Components for the Open Education EcosystemThe Structure and Components for the Open Education Ecosystem
The Structure and Components for the Open Education Ecosystem
Hans Põldoja
 
EGUSQUIZA-NOBLE-webinar-task-design-IC-MOOCs
EGUSQUIZA-NOBLE-webinar-task-design-IC-MOOCsEGUSQUIZA-NOBLE-webinar-task-design-IC-MOOCs
EGUSQUIZA-NOBLE-webinar-task-design-IC-MOOCs
Carolina Egúsquiza
 
Emerging technologies-The evolving roles of language teachers
Emerging technologies-The evolving roles of language teachersEmerging technologies-The evolving roles of language teachers
Emerging technologies-The evolving roles of language teachers
Ha Pham
 
Sd Session Svetlana
Sd Session SvetlanaSd Session Svetlana
Sd Session Svetlanaecsveta
 
Sloop presentation at the 1st Tenegen meeting
Sloop presentation at the 1st Tenegen meetingSloop presentation at the 1st Tenegen meeting
Sloop presentation at the 1st Tenegen meeting
Pierfranco Ravotto
 
Semantic Technologies in Learning Environments -Promises and Challenges-
Semantic Technologies in Learning Environments -Promises and Challenges-Semantic Technologies in Learning Environments -Promises and Challenges-
Semantic Technologies in Learning Environments -Promises and Challenges-
Dragan Gasevic
 
Semantic Technologies in Learning Environments -Promises and Challenges-
Semantic Technologies in Learning Environments -Promises and Challenges-Semantic Technologies in Learning Environments -Promises and Challenges-
Semantic Technologies in Learning Environments -Promises and Challenges-
Dragan Gasevic
 
ICOPER - Learning Outcomes and Competences - Educon2011
ICOPER - Learning Outcomes and Competences - Educon2011ICOPER - Learning Outcomes and Competences - Educon2011
ICOPER - Learning Outcomes and Competences - Educon2011Jad Najjar
 
Orchestration of outcome based technology-enhanced learning opportunities
Orchestration of outcome based technology-enhanced learning opportunitiesOrchestration of outcome based technology-enhanced learning opportunities
Orchestration of outcome based technology-enhanced learning opportunities
Michael Derntl
 
Semantic Technologies in Learning Environments
Semantic Technologies in Learning EnvironmentsSemantic Technologies in Learning Environments
Semantic Technologies in Learning Environments
Dragan Gasevic
 
Semantic Technologies in Learning Analytics
Semantic Technologies in Learning AnalyticsSemantic Technologies in Learning Analytics
Semantic Technologies in Learning Analytics
Dragan Gasevic
 
Educational Resources for 21st Century Schools
Educational Resources for 21st Century SchoolsEducational Resources for 21st Century Schools
Educational Resources for 21st Century Schools
CITE
 
RIDE 2010 presentation: Virtual Classrooms - use cases and their pedagogy
RIDE 2010 presentation: Virtual Classrooms - use cases and their pedagogyRIDE 2010 presentation: Virtual Classrooms - use cases and their pedagogy
RIDE 2010 presentation: Virtual Classrooms - use cases and their pedagogy
Centre for Distance Education
 
mooc
moocmooc
Yannis@chili 20171211b
Yannis@chili 20171211bYannis@chili 20171211b
Yannis@chili 20171211b
Yannis
 
Adopting New Learning Technologies
Adopting New Learning Technologies Adopting New Learning Technologies
Adopting New Learning Technologies
David Asirvatham
 
Building a Learning Resource Exchange (LRE) Service for Schools
Building a Learning Resource Exchange (LRE) Service for SchoolsBuilding a Learning Resource Exchange (LRE) Service for Schools
Building a Learning Resource Exchange (LRE) Service for Schools
jimayre
 
The Multi-Faceted Focus of International Collaborations
The Multi-Faceted Focus of International CollaborationsThe Multi-Faceted Focus of International Collaborations
The Multi-Faceted Focus of International Collaborations
lamericaana
 

Similar to Ph.D. Defense (20)

Visual data-enriched design technology for blended learning
Visual data-enriched design technology for blended learningVisual data-enriched design technology for blended learning
Visual data-enriched design technology for blended learning
 
IWMW 2002: Interoperability and learning standards briefing, Introduction
IWMW 2002: Interoperability and learning standards briefing, IntroductionIWMW 2002: Interoperability and learning standards briefing, Introduction
IWMW 2002: Interoperability and learning standards briefing, Introduction
 
The Structure and Components for the Open Education Ecosystem
The Structure and Components for the Open Education EcosystemThe Structure and Components for the Open Education Ecosystem
The Structure and Components for the Open Education Ecosystem
 
EGUSQUIZA-NOBLE-webinar-task-design-IC-MOOCs
EGUSQUIZA-NOBLE-webinar-task-design-IC-MOOCsEGUSQUIZA-NOBLE-webinar-task-design-IC-MOOCs
EGUSQUIZA-NOBLE-webinar-task-design-IC-MOOCs
 
Emerging technologies-The evolving roles of language teachers
Emerging technologies-The evolving roles of language teachersEmerging technologies-The evolving roles of language teachers
Emerging technologies-The evolving roles of language teachers
 
Sd Session Svetlana
Sd Session SvetlanaSd Session Svetlana
Sd Session Svetlana
 
Sloop presentation at the 1st Tenegen meeting
Sloop presentation at the 1st Tenegen meetingSloop presentation at the 1st Tenegen meeting
Sloop presentation at the 1st Tenegen meeting
 
Semantic Technologies in Learning Environments -Promises and Challenges-
Semantic Technologies in Learning Environments -Promises and Challenges-Semantic Technologies in Learning Environments -Promises and Challenges-
Semantic Technologies in Learning Environments -Promises and Challenges-
 
Semantic Technologies in Learning Environments -Promises and Challenges-
Semantic Technologies in Learning Environments -Promises and Challenges-Semantic Technologies in Learning Environments -Promises and Challenges-
Semantic Technologies in Learning Environments -Promises and Challenges-
 
ICOPER - Learning Outcomes and Competences - Educon2011
ICOPER - Learning Outcomes and Competences - Educon2011ICOPER - Learning Outcomes and Competences - Educon2011
ICOPER - Learning Outcomes and Competences - Educon2011
 
Orchestration of outcome based technology-enhanced learning opportunities
Orchestration of outcome based technology-enhanced learning opportunitiesOrchestration of outcome based technology-enhanced learning opportunities
Orchestration of outcome based technology-enhanced learning opportunities
 
Semantic Technologies in Learning Environments
Semantic Technologies in Learning EnvironmentsSemantic Technologies in Learning Environments
Semantic Technologies in Learning Environments
 
Semantic Technologies in Learning Analytics
Semantic Technologies in Learning AnalyticsSemantic Technologies in Learning Analytics
Semantic Technologies in Learning Analytics
 
Educational Resources for 21st Century Schools
Educational Resources for 21st Century SchoolsEducational Resources for 21st Century Schools
Educational Resources for 21st Century Schools
 
RIDE 2010 presentation: Virtual Classrooms - use cases and their pedagogy
RIDE 2010 presentation: Virtual Classrooms - use cases and their pedagogyRIDE 2010 presentation: Virtual Classrooms - use cases and their pedagogy
RIDE 2010 presentation: Virtual Classrooms - use cases and their pedagogy
 
mooc
moocmooc
mooc
 
Yannis@chili 20171211b
Yannis@chili 20171211bYannis@chili 20171211b
Yannis@chili 20171211b
 
Adopting New Learning Technologies
Adopting New Learning Technologies Adopting New Learning Technologies
Adopting New Learning Technologies
 
Building a Learning Resource Exchange (LRE) Service for Schools
Building a Learning Resource Exchange (LRE) Service for SchoolsBuilding a Learning Resource Exchange (LRE) Service for Schools
Building a Learning Resource Exchange (LRE) Service for Schools
 
The Multi-Faceted Focus of International Collaborations
The Multi-Faceted Focus of International CollaborationsThe Multi-Faceted Focus of International Collaborations
The Multi-Faceted Focus of International Collaborations
 

Ph.D. Defense

  • 1. LiDom Builder: Automatising the Construction of Multilingual Domain Modules Ángel Conde Manjón GaLan Research Group – LSI Department University of the Basque Country (UPV/EHU) Supervisors: Dr. Mikel Larrañaga Olagaray & Dr. Ana Arruarte Lasa UPV/EHU 25 February 2016
  • 2. • Technology Supported Learning Systems (TSLS) • Learning Management Systems: • Massive Open Online Courses: • Intelligent Tutoring Systems: SQL-Tutor • … • Bilingual and Multilingual Contexts are a reality (Unesco, 2003) • Acquiring the Domain Module is a cost and work intensive task Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Context 2
  • 3. 3 Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Main Goal Automatising the construction of MULTILINGUAL DOMAIN MODULES
  • 4. 4 DOM-Sortze (Larrañaga, 2012) a framework for building DOMAIN MODULES from electronic textbooks Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Previous Work: DOM-Sortze
  • 5. 5 Electronic Textbook LDO Gathering Preprocess LOs Gathering Domain Module Document Body Internal Representation Document Outline Internal Representation Learning Domain Ontology 1 2 3 Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Previous Work: DOM-Sortze
  • 6. 6 Planetary System Solar System Moon Satellite Planet Earth partOfpartOf partOf isA isA prerequisite The Moon is Earth's only natural satellite LO1 hasDR Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder DOM-Sortze: Domain Module Representation Formalism Learning Domain Ontology (LDO) Topics and pedagogical relationships Learning Objects (LO) • Definitions • Examples • Problem Statements • …
  • 7. Limitations of DOM-Sortze: 1. Developed for a single language: Basque. 2. Its formalism is not able to represent Multilingual Domain Modules. 7 Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder DOM-Sortze: Limitations
  • 8. 8 1. Can be the formalism used in DOM-Sortze be enhanced for Multilingual Domain Modules? – Extend the formalism to deal with Multilingual Domain Modules. 2. Which enhancements are required to deal with various languages? – Develop a method for extracting Multilingual Terminology. – Improve the Relationship Acquisition. – Provide a method for acquiring Multilingual Learning Objects. Automatising the construction of MULTILINGUAL DOMAIN MODULES Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Goals
  • 9. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder 9 I. Introduction: Motivations and Goals II. LiDom Builder: Building Multilingual Domain Modules III. Acquisition of Multilingual Terminology IV. Identification of Pedagogical Relationships V. Gathering Multilingual Learning Objects VI. Conclusions and Future Work Outline
  • 10. 10 I. Introduction: Motivations and Goals II. LiDom Builder: Building Multilingual Domain Modules III. Acquisition of Multilingual Terminology IV. Identification of Pedagogical Relationships V. Gathering Multilingual Learning Objects VI. Conclusions and Future Work Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future WorkLiDomBuilder Outline
  • 11. 11 Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future WorkLiDomBuilder Multilingual Terminology Extraction Pedagogical Relationship Extraction Textbook Multilingual Learning Object Generation LiDom Builder Overview LiDom Builder: framework for automatising the acquisition of Multilingual Domain Modules Domain Module
  • 12. Equiv. “en” Equiv. “es” 12 Planetary System Solar System Moon Satellite Planet Earth partOfpartOf partOf isA isA prerequisite pedagogically Close “ilargi” “luna” “moon” LO1 LO2 eu en es hasDR hasDR @ @ @ @ @ @ Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future WorkLiDom Builder Multilingual Domain Module Formalism
  • 13. Language Identification LDO Gathering 13 Electronic Textbook Preprocess LOs Gathering Document Internal Representation Document Outline Internal Representation 1 2 3 Domain Module Learning Domain Ontology NLP Parsers Illinois Chunker Illinois POS tagger FreeLing IXA-Pipes Topic Extraction Relationship Extraction Set of Heuristics Grammar Multilingual LOs Grammar Discourse Markers Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future WorkLiDom Builder Proposed Enhancements LiTeWi LiReWi LiLoWi 0
  • 14. 12 Electronic Textbook LDO Gathering Preprocess LOs Gathering Document Internal Representation Document Outline Internal Representation 1 2 3 Domain Module Learning Domain Ontology Knowledge Resources ….. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future WorkLiDom Builder Proposed Enhancements
  • 15. 15 • Two phases • Tuning up • Set the thresholds and default confidence values. • Evaluation • Gold Standard (Recall, Precision, F1-Score). • Expert validation. • Use of three textbooks 1. Programming: Introduction to Object Oriented Programming (Wong .S, 2010). 2. Astronomy: Introduction to Astronomy (Morison, 2008). 3. Biology: Introduction to Molecular Biology (Raineri,2010). Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future WorkLiDom Builder General Evaluation Methodology
  • 16. 16 I. Introduction: Motivation and Goals II. LiDom Builder: Building Multilingual Domain Modules III. Acquisition of Multilingual Terminology IV. Identification of Pedagogical Relationships V. Gathering Multilingual Learning Objects VI. Conclusions and Future Work Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Outline
  • 17. 17 In DOM-Sortze, terminology extracted with ErauzTerm (Alegria et al., 2004). A new tool called LiTeWi has been developed. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Acquisition of Multilingual Terminology
  • 18. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder LiTeWi 18 TF-IDF KP-Miner CValue Shallow Parsing Grammar Electronic Textbook Candidate Extraction Generic Corpus Mapping Disambiguation Filtering Mapping to other languages Candidate Selection Combination
  • 19. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Shallow Parsing Algorithm 19 • Uses a derived grammar from (Larrañaga, 2012). Constraint Grammar applied to POS tags Shallow Parser Topics Array List Stack ……… Grammar Topic + [*]+ part of + [det] +Topic ………………. Textbook Sentences may contain topics This is called an Array List A Stack is used to model systems that exhibit LIFO… Extraction Rules Chunks an Array List A Stack …….
  • 20. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder LiTeWi TF-IDF KP-Miner CValue Shallow Parsing Grammar Electronic Textbook Candidate Extraction Mapping Disambiguation Filtering Mapping to other languages Generic Corpus Candidate Selection Combination 20
  • 21. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Mapping 21 • Terms mapped to their corresponding Wikipedia articles. • Search procedure to match Wikipedia article titles and their labels.
  • 22. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder LiTeWi TF-IDF KP-Miner CValue Shallow Parsing Grammar Electronic Textbook Candidate Extraction Mapping Disambiguation Filtering Mapping to other languages Generic Corpus Candidate Selection Combination 22
  • 23. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Disambiguation 23 • Method based on global disambiguation (Milne et al., 2008). • Domain knowledge step added to improve the results. • Use as a disambiguation context the domain important terms. • Gold Term List: Domain important terms with only one sense. Monosemic terms that have highest CValue score.
  • 24. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Disambiguation 24 Wikiminer Compare Service Term List (to disambiguate) -Java - Inheritance -Property Disambiguated Term -Java (programming Language) Gold Term List -Class -Programming Language -Array List Class Prog. Lang. Array List Prog. Language 0.90 0.85 0.64 Island 0.7 0.77 0.53 City 0.56 0.75 0.6 Average 0.89 0.70 0.63 -Java
  • 25. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder LiTeWi 25 TF-IDF KP-Miner CValue Shallow Parsing Grammar Electronic Textbook Candidate Extraction Mapping Disambiguation Filtering Mapping to other languages Generic Corpus Candidate Selection Combination
  • 26. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Filtering Unwanted Terms 26 Wikiminer Compare Service Number of Related Gold Terms Gold Term List -Solar System - Black Hole -Solar Mass Term List (to filter) -Universal Studios -Planet -Windows 98 Relatedness Score -Planet -Windows 98 Domain Related Term -Planet -Planet N(>1) Threshold(>=0.6) Solar System (0.34) Black Hole (0.53) Solar Mass (0.47) Solar System (0.23) Black Hole (0.68) Solar Mass (0.50) -Universal Studios -Windows 98
  • 27. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder LiTeWi 27 TF-IDF KP-Miner CValue Shallow Parsing Grammar Electronic Textbook Candidate Extraction Mapping Disambiguation Filtering Mapping to other languages Generic Corpus Candidate Selection Topic EN ES EU Moon Moon Luna Ilargia Combination
  • 28. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Evaluation 28 Tuning up • Introduction to Object Oriented Programming textbook. Evaluation • Gold Standard and Expert Validation. • Gold Standard based on the terms appearing on the index of each textbook. • Evaluated on Introduction to Astronomy and Introduction to Molecular Biology.
  • 29. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Results 29 Gold-Standard Ex. Validation Precision (%) Recall (%) F1 Score (%) Correctness (%) Astronomy 3.55 62.96 6.72 18.55 Mol. Biology 2.24 10.21 3.67 49.27 Gold-Standard Ex. Validation Precision (%) Recall (%) F1 Score (%) Correctness (%) Astronomy 17.96 72.55 28.79 78.77 Mol. Biology 27.09 50.53 87.70 71.65 • Wikifier (Cheng , 2013) • LiTeWi
  • 30. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Outline 30 I. Introduction: Motivation and Goals II. LiDom Builder: Building Multilingual Domain Modules III. Acquisition of Multilingual Terminology IV. Identification of Pedagogical Relationships V. Gathering Multilingual Learning Objects VI. Conclusions and Future Work
  • 31. Introduction 31 In DOM-Sortze, relationship acquisition for Basque using Shallow Parsing An adaptation and extension of the Heuristic-based analysis of the outline has been developed. A new tool called LiReWi has been developed. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder
  • 32. Heuristic-based analysis of the outline 32 Document Outlines • Reflects the organization made by the author. • The structure of the outline underlies pedagogical relationships. • Low cost process (summarised). DOM-Sortze • Each outline item is considered as a domain topic. • By default gathers a partOf relation between an item and its subitems. • Heuristics to detect isA relations. LiDom Builder • Adaptation to English of heuristics from (Larrañaga et al., 2004). • Improvement of isA identification using Wikitaxonomy (Ponzetto et al., 2007). Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder
  • 33. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Wikipedia Enhanced Process 33 ……….. 4.- Structure of polymers / Macromolecules 4.1.- Polymer chemistry 4.2.- Molecular weight 4.3.- Form, structure and molecular configuration 4.3.- Supramolecular arrangement 4.4.- Crystalline and amorphous polymers 4.5.- Families of polymeric materials 4.5.1.- Thermosettings 4.5.2.- Thermoplastics 4.5.3.- Elastomers 5.- Phase diagrams / Definitions 5.1.- Solid solutions 5.2.- Phases rule of Gibbs 5.3.- Types of phase diagram 1. Identify groups of sibling nodes 2. Select the groups of leaf nodes in which the partOf relationship has been identified Thermosettings polymer (Article id= 321827) Thermoplastic (Article id= 182444) Elastomer (Article id = 842224) 3. Link and disambiguate each node to a Wikipedia article using Wikiminer (Milne et al., 2012) Materials science Elastomers Polymer physics Polymer physics Polymer chemistry 4. Process every group using (Ponzetto et al., 2007) taxonomy 5. Infer isA relationship in those groups that share a common ancestor
  • 34. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Evaluation 34 Gold Standard • 57 document outlines in English from different domains. • Human instructors defined the optimal output (LDOs). • Each LDO restricted to the topics of the outline.
  • 35. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Results 35 • Heuristic Analysis • Heuristic Analysis + Wikipedia Enhanced Process partOf isA Total Precision (%) 84.12 78.95 83.85 Recall (%) 98.66 21.20 83.85 partOf isA Total Precision (%) 89.19 77.30 87.70 Recall (%) 96.49 50.53 87.70
  • 36. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Identification of Pedagogical Relationships: LiReWi 36 Mapping Topics Knowledge Bases LiReWiElectronic Textbook Candidate Relationship Extraction Combination & Filtering
  • 37. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Mapping 37 Topic: Syntax Wikipedia id=3206060 WordNet id=? Comparer Page Rank Disambiguation Syntax WordNet id= 6176322 Syntax WordNet id= 8436203 Final id Mapped WordNet id returned= WordNet id = 6176322 ! = Fernando’s Mappings Babelnet Mappings Wiki Id WordNet id 3206060 8436203,… ………. ……….. ……… ………… Wiki Id WordNet id 3206060 6176322,… ………. ……….. ……… ………… Mapping To WordNet Disambiguation Disambiguation Context WordNet id 8436203 6176322 ………. Java, Programming….
  • 38. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Identification of Pedagogical Relationships: LiReWi 38 Mapping Candidate Relationship Extraction Topics Knowledge Bases LiReWiElectronic Textbook Combination & Filtering
  • 39. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Candidate Relationship Extraction 39 WordNet Extractor Wibi Extractor WikiRelations Extractor Shallow Parsing Grammar Extractor Sequential Extractor NLP data WikiTaxonomy Extractor isA partOf prerequisite prerequisite pedagogically- Close isA partOf isAisA isA partOf Candidate Relationships
  • 40. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Candidate Relationship Extraction 40 Path Based Extractors: Rocky planet Mars Planet (path length=2, confidence=0.9)(path length=1, confidence=1) isA isA WordNet Extractor Wibi Extractor WikiRelations Extractor Shallow Parsing Grammar Extractor Sequential Extractor WikiTaxonomy Extractor
  • 41. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Candidate Relationship Extraction 41 • WikiRelations: Set of tuples that state the relationships between Wikipedia categories. T Tauri, Star, isA ………… Radiation, Radio waves, partOf Light, Electromagnetic radiation, partOf ………… Light, Electromagnetic radiation, partOf ………… T Tauri star, Star, isA 007 license to kill, video games, isA WikiRelations Tuples Light partOf Electromagnetic radiation (Confidence=0.7) Topic: Light Cat1: Light Cat2: … Topic: Electromagnetic radiation Cat1: Electromagnetic radiation Topic: …… WordNet Extractor Wibi Extractor WikiRelations Extractor Shallow Parsing Grammar Extractor Sequential Extractor WikiTaxonomy Extractor
  • 42. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Sentences with mentions Earth is part of the Solar System. ………………. Candidate Relationship Extraction 42 • Extractor based on the rules defined in (Larrañaga, 2012). Topics Solar System Earth Planet Mars Find Mentions Constraint Grammar applied to POS tags Relationships Earth partOf Solar System ………………. ………… Grammar Topic + [*]+ part of + [det] +Topic ………………. Textbook WordNet Extractor Wibi Extractor WikiRelations Extractor Shallow Parsing Grammar Extractor Sequential Extractor WikiTaxonomy Extractor
  • 43. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder WordNet Extractor Wibi Extractor WikiRelations Extractor Shallow Parsing Grammar Extractor Sequential Extractor WikiTaxonomy Extractor Candidate Relationship Extraction 43 Textbook Topics Wavelength Emission spectrum Planet Solar System Find Mentions Look links in/links out on Wikipedia Reasoner Relations Emission spectrum pedagogicallyClose Wavelength ……………………. Possible candidates: Wavelength, Emission Spectrum (2 times) Sentences with mentions ...leading to different radiated wavelengths, make up an emission spectrum. ... the emission spectrum of a particular star, the wavelength of … …………….. Relatedness > threshold Emission spectrum (link out) Wavelength Wavelength (link out) Emission spectrum
  • 44. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Candidate Relationship Extraction 44 Topic1 Topic2 Topic3 Topic4 Topic1 is pedagogicallyClose to Topic2 Topic3 is a prerequisite of Topic4 4 3 4 1 Mentions (Links): -Topic3, 4 mentions -…. Mentions (Links): -Topic4, 1 mentions -…. Mentions (Links): -Topic2, 3 mentions -…. Mentions (Links): -Topic1, 4 mentions -…. WordNet Extractor Wibi Extractor WikiRelations Extractor Shallow Parsing Grammar Extractor Sequential Extractor WikiTaxonomy Extractor
  • 45. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Identification of Pedagogical Relationships: LiReWi 45 Mapping Candidate Relationship Extraction Combination & Filtering Learning Domain Ontology Topics Knowledge Bases LiReWiElectronic Textbook
  • 46. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Combination & Filtering Relationships 46 -Earth isA Planet (WordNet Ex) (Conf=1) -Earth isA Planet (WikiRelations Ex) (Conf=0.8) -Planet isA Earth (WikiTax Ex) (Conf=0.7) -Earth partOf Solar System (WordNet Ex) (Conf=1) -Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5) -Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1) -Earth partOf Solar System (WordNet Ex) (Conf=1) Relationships -Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1) -Planet isA Earth (WikiTax Ex) (Conf=0.7) -Earth partOf Solar System (WordNet Ex) (Conf=1) -Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1) -Earth partOf Solar System (WordNet Ex) (Conf=1) -Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5) Confidence Combiner Conflict Resolver Filter Final Relationships Conflict Resolution Relationships combined Filter below threshold -Planet isA Earth (WikiTax Ex) (Conf=0.7) -Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)
  • 47. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Evaluation 47 Tuning up • Introduction to Object Oriented Programming textbook. Evaluation • Gold Standard and Expert Validation. • Introduction to Astronomy textbook. • Gold standard, four experts stated the set of relationships. • Using a subset of the main domain topics according to the score given by LiTeWi.
  • 48. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Results 48 Precision (%) Recall (%) F1-Score (%) Expert Validation (%) LiReWi 36.21 50.57 42.42 43.98 DOM-Sortze 63.27 20.74 31.24 N.A.
  • 49. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Multilingual Learning Objects Conclusions and Future Work LiDom Builder Outline 49 I. Introduction: Motivations and Goals II. LiDom Builder: Building Multilingual Domain Modules III. Acquisition of Multilingual Terminology IV. Identification of Pedagogical Relationships V. Gathering Multilingual Learning Objects VI. Conclusions and Future Work
  • 50. Gathering Multilingual Learning Objects Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Conclusions and Future Work LiDom Builder Introduction 50 In DOM-Sortze, LOs acquisition for Basque using Shallow Parsing. A Validation of the approach for English has been carried out. LiLoWi has been developed to move towards the elicitation of Multilingual LOs.
  • 51. Gathering Multilingual Learning Objects Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Conclusions and Future Work LiDom Builder Adapting Learning Object elicitation to English 51 Basque English Pattern adibidez, @topic for instance, @topic Example Uretan, adibidez hidrogeno eta oxigeno atomoak daude. For instance, there are hydrogen and oxygen atoms in water. Textbook Topics Wavelength Emission spectrum Earth. Solar System Find Mentions Grammar Sentences with mentions Earth is a planet. ………………. Learning Objects The Moon is Earth's only natural satellite
  • 52. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Evaluation 52 Gold Standard and Expert Validation: • Evaluated on Introduction to Object Oriented Programming. • Gold Standard built by some experts. Two Aspects • Grammar. • Learning Objects.
  • 53. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Evaluation 53 Definitions Examples Prob. Stat. Princ. Stat. Total Found 164 1 12 49 226 Correct 138 1 7 35 181 Precision (%) 84.15 100 58.33 71.43 80.09 Recall (%) Expert Validation (%) DOM-Sortze 70.31 91.88 LiDom 75.93 86.79 • Grammar • Learning Objects
  • 54. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder LiLoWi 54 Metadata Generator Multilingual LOs from WordNet/Wikipedia Topics Solar System Emission spectrum Earth. LO2es LO1en LO2en Equivalents
  • 55. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder • Evaluated on the Principles of Object-Oriented Programming. • Used the same LDO described in the previous experiment. • Expert Validation. Two Aspects  How LiLoWi enhanced the LO coverage for the LDO topics.  How many multilingual LOs are extracted. Evaluation 55
  • 56. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Results 56 Definitions References English Spanish Basque French Number of topics Topic coverage (%) 46 56.10 36 43.90 9 10.97 36 43.90 12 14.63 • Grammar + Wikipedia/WordNet Total Definitions Number of topics 21 19 Topics coverage (%) 25.61 19.51 • Grammar-based approach
  • 57. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder I. Introduction: Motivation and Goals II. LiDom Builder: Building Multilingual Domain Modules III. Acquisition of Multilingual Terminology IV. Identification of Pedagogical Relationships V. Gathering Multilingual Learning Objects VI. Conclusions and Future Work Outline 57
  • 58. 58 1. Provision of a suitable formalism to represent Multilingual Domain Modules. 2. Developed a method for the elicitation of multilingual terminology. – First term extractor to our knowledge based on searching patterns for educational content. 3. Relationship Acquisition has been improved. – Extension of outline processor to English + Enhancement with Wikipedia. – Development of LiReWi, a module for the elicitation of pedagogical relationships for Educational Ontologies. – Developed a state of the art mapper from Wikipedia to WordNet. 4. Developed a method for multilingual LO generation. – Extension of DOM-Sortze for English. – Development of LiLoWi, a module for the elicitation of multilingual LOs using different knowledge bases. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder Goal Achievement
  • 59. Conclusions and Future Work • Automatising the inclusion of new languages. • Multilingual Learning Object generation from similarity and machine translation techniques. • Concept Map-Based Learning Object Generation. • Improvements on each module of LiDom Builder. 59 Future Work Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects LiDom Builder
  • 60. Conclusions and Future Work Software Released 60 Software • LiTeWi, released with Spanish/English support: https://github.com/Neuw84/LiTe • Wikipedia/WordNet mapper: https://github.com/Neuw84/Wikipedia2WordNet • Spanish stemmer: https://github.com/Neuw84/SpanishInflectorStemmer • Training Data for Wikiminer: https://github.com/Neuw84/Wikipedia353Spanish • LiReWi: coming soon…. Web Demo • LiDom builder : http://galan.ehu.es/lidom/ Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects LiDom Builder
  • 61. Introduction Acquisition of Multilingual Terminology Identification of Pedagogical Relationships Gathering Learning Objects Conclusions and Future Work LiDom Builder 61 Publications A Combined Approach for Eliciting Relationships for Educational Ontologies Using Several Knowledge Bases. Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga. Journal of Knowledge-Based Systems. Submitted. LiteWi: A Combined Term Extraction Method for Eliciting Educational Ontologies from Textbooks. Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga, Dan Roth. Journal of the Association for Information Science and Technology, 67(2), pp. 380–399, 2016. Testing Language Independence in the Semiautomatic Construction of Educational Ontologies. Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga. 12th International Conference on Intelligent Tutoring Systems ITS 2014, Springer, Vol. 8474, pp. 545-550, 2014. Automatic Generation of the Domain Module from Electronic Textbooks. Method and Validation. Mikel Larrañaga, Ángel Conde, Iñaki Calvo, Jon A. Elorriaga, Ana Arruarte IEEE Transactions on Knowledge and Data Engineering, 26(1), pp. 69-82, 2014. Automating the Authoring of Learning Material in Computer Engineering Education. Ángel Conde, Mikel Larrañaga, Iñaki Calvo, Jon A. Elorriaga, Ana Arruarte. 42nd Frontiers in Education Conference, pp. 1376-1381, 2012.
  • 62. LiDom Builder: Automatising the Construction of Multilingual Domain Ángel Conde Manjón GaLan Research Group – LSI department, University of the Basque Country (UPV/EHU) Supervisors: Mikel Larrañaga Olagaray & Ana Arruarte Lasa UPV/EHU

Editor's Notes

  1. Good morning to everybody, I´m Angel Conde Manjón a member of the Galan research group at the University of the Basque Country First of all I would like to thank the committee members for attending this thesis defense. (This thesis has been developed under the supervision of Dr. Larrañaga and Dr. Arruarte and supported by the GaLan Research Group) This thesis called LiDom Builder is about Automatising the Construction of Multilingual Domain Modules (SOBRA I am going to present my thesis to obtain the PhD degree in Computer Sciences from the University of the Basque Country. )
  2. Well I am going to start with some facts for putting this work in context. -- The first one is that the Technology Supported Learning Systems are very popular and broadly used nowadays. For example……. -- The second fact is that Bilingual and multilingual…. -- Finally I must say that acquiring the domain module, that’s it the TSLS content, is a cost and work intensive task. (Then Providing aid tools for building such systems, and, especially, tools for developing the learning content for those systems, is essential. ____________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________
  3. Then, providing tools for … atomatising the construction of multilingual domain modules is the main goal of this work _____________________________________________ ---------------------------------------------------------------------- Lets start with the introduction Nowadays the Technology Supported Learning Systems are very popular and broadly used. For example, Domain module is the core of any tlts Any tsls requires an appropriate representation of the knowledge to be mastered by the student, i.e., the Domain Module. Cost to build them in terms of time and difficulty Then Providing aid tools for building such systems, and, especially, tools for developing the learning content for those systems, is essential. Voy a empezar contextualizando el trabajo, partimos de tres realidades: - Tecnología cada vez más utilizada - Contextos bilingües y multilingües - La dificultad de construcción del módulo del dominio que define la información requerida por los sitemas para realizar su labor
  4. In a previous work Larrañaga in 2012 proposed a framework called Dom-Sortze to build domain .... But why use textbooks? Because The authors of the textbooks face the same problems when writing their books. They include information about the domain topics, definition, examples, and even(iven) exercises that will allow them to mastering the contents. Moreover, they structure the textbook in means that facilitates understanding and learning.
  5. In Dom Sortze the Domain Module acquisition process entails three tasks: 1. First, the electronic textbook is prepared for the knowledge elicitation tasks 2. Once the internal representations of the outline and the body of the textbook have been extracted, the LDO is generated 3. Finally, After building the LDO, the LOs are gathered
  6. In Dom-Sortze, the Domain Module is described by means of an Educational Ontology The LDO contains the main domain topics and the pedagogical relationships among them. (esto no lo digo Pedagogical relationships can be structural ─isA and partOf─ or sequential ─prerequisite and next─ ) The set of Learning Objects (LOs) that will be used for mastering each domain topic (definitions, examples, exercises, etc.) (approach presented throughout(thruout) this thesis)
  7. Dom-Sortze has two limitations…. First…. It only supports Basque language. Second, The used formalism Represents Domain Modules in one language. _________________________________________ ------------------------------------------------------------ But multilingual…. Blah blah Therefore, we should work on an answer for this necessity
  8. Taking into account the previous work and the main goal. We should answer these questions to develop the specific objectives Blah blah… These questions lead us to the presentation´s structure.
  9. This presentation is organised in six main sections. We have already gone throughout(thruout) the introduction. following, our proposal for eliciting multilingual domain modules will be described, Next, the three main parts of the system will be depicted And Finally, I will give some brief conclusions and future lines
  10. Well its time to focus on LiDOm Builder ….. The system we have built for building multilingual domains modules
  11. LiDom Builder is a framework that we have developed in order to deal with the task of automatising the construction of Multilingual Domain Modules An overview of the system is presented below where it can be seen the three main tasks that need to be carriedo in order to built Domain Modules Multilingual terminology Pedagogical relationship Multilingual Learning Object Generation
  12. In First place, to be able to deal with multilingual domain modules the formalism presented in Dom-Sortze has been extended That’s it, for each topic we have assigned (asaingd) an identifier, and different labels depending the language Moreover the LOs formalism has been extended to support different languages and to define equivalents (definition on different languageS)
  13. In this thesis the next enhancements are added to Dom Sortze, First a language identification procedure i For the preprocess NLP parsers should be added for each language. Then the topic extraction and the relation extraction processes have been extended on the LDO Gathering step Finally the LOS gathering has been enhanced by obtaining multilingual LOs
  14. During the last few years, knowledge resources such as Wikipedia and WordNet have been used for terminology extraction, relations acquisition and…. In general for natural language processing. That’s why they are incorporated into LiDom Builder. However, as working with Wikipedia entails big efforts due to its size WikipediaMiner has been used in order to interact with it.
  15. For this work the following evaluation methodology has been used Two phases…. Tuning evaluation.. Gold Standard, where the results are compared against it Expert validation, as the results of the system may be interesting for mastering the topics but not in the gold standard…. An expert evaluation…. 3 books of different domains have been used, programming astronomy and biology For some parts more resources have been used, but those will be explained later.
  16. Now its time to focus on one of the main parts of LidOm, the one that takes care of the acquisition of multilingual terminology
  17. In Dom-Sortze, only possible to extract terminology using ErauzTerm (Alegria, 2004) for textbooks written in Basque. LiTeWi, a tool for eliciting multilingual terminology has been developed.
  18. LiTeWi. Terminology acquisition entails two main steps: the identification of the candidates using diverse techniques, the combination and the refinement of the results to obtain the final set of terms, ALGORITMOSDE OTROS , los tres primeros TF-IDF (Salton, 1988): besides the term frequency, considers the relevance of the terms in the corpus. KP-Miner (El-Beltagy, 2009): a rule based keyphrase extractor for English and Arabic. CValue (Frantzi, 2000): takes into account the occurrence of terms candidates as a part of longer terms. Moreover we have developed an algorithm called Shallow Parsing Grammar
  19. The Shallow Parsing Grammar algorithm has been developed with the hypothesis, of finding terms where LOs fragments may be found .. (esto no decirlo de viva voz porque lia?, dejarlo por si preguntan) This algorithm uses a grammar derived from larrañaga 2012 where the fragments that may contains LOs will be selected For processing the textbook with this algorithm, First we process the textbook with a grammar based in the previously mentioned one where we identify grammar structures that may contain DRs. Then we process the sentences with a shallow parser in order to extract the Noun Phrases.
  20. Once all the algorithms finished the candidate selection process starts. First a combination step is done where all the results from the different algorithms are merged. Following ,a Mapping procedure to Wkipedia is done
  21. For mapping the terms to Wikipedia a search procedure to match Wikipedia article titles and their labels is done. However…… The same term may have different senses. For example…. Then we need to disambiguate them The first part is the Wikipedia Mapping part. ….. The terms obtained in the previous step are related to their corresponding Wikipedia articles. Those not mapped are filtered. This entails searching in Wikipedia to determine whether or not each selected term can be related to one or more Wikipedia articles, each one representing a possible sense/meaning of the term. Depending the stemmer used trading of Precision/Recall Problem:
  22. For addressing the problem of various senses a disambiguation step has been added.
  23. For disambiguation the terms a Method based…. A domain knowledge step is added to improve the results We use Milne system with domain important terms as input the so called GOLD TERM LIST How to choose them? After some analys we realized that longer terms usually more specific. (you can see in the figure that)….. We use CValue that assings more weigh to those term, furthermore the top results important of the domain. A method that uses Milne and Witten Global disambiguation (Milne2008) approach is used to fulfil this task, to which end the Wikiminer Compare Service is used. This service provides a way for disambiguating term pairs using a classifier that takes as features: • The data provided by Wikipedia. Wikipedia provides statistics about how an article label is associated to a sense/meaning. For example, 55% of “Java” labels refer to the programming language whereas 15% of them refer to the Indonesian island. These statistics yield three features for the classifier: the average, maximum and minimum prior probabilities of the two concepts. • The semantic relatedness between the concepts. The relatedness score can be computed using the links of the articles as features. Milne2013 claim that “Wikipedia articles reference each other extensively, and at first glance the links between them appear to be promising semantic relations. Unfortunately, the article also contains links to many irrelevant concepts (e.g. terms not related to the domain of the analyzed book). Therefore, an individual link between two Wikipedia articles cannot be trusted”. There are different possibilities for computing the relatedness measure, for instance, using the article in-links (those inside the article and refer to other articles). Both measures use different sets of links. The normalized distance measure is based on an approach that looks for documents that mention the terms of interest, and has been adapted to use the links made to articles. The vector similarity measure is based on an approach that looks for terms mentioned within two documents of interest, and has been adapted to use the links contained within articles. However, there is no reason why each measure should not be applied to the other link direction. Thus, each of the measures described above yields two features, one for in-links and the other for out-links. Finally, another measure taking into account the link counts for each article could be used. Different configurations have been tested. As pointed out by Milne2013, the more features used, the higher the performance is. Therefore, the measure that combined the links-in, links-out and link-counts was selected for computing the relatedness score. the term size in n-grams (number of words composing the term) increases. Therefore, the more n-grams a term has, the more specific it is. Nevertheless, domain relevant terms are required. Hence, the monosemic terms with highest CValue score are chosen for the gold term list. This
  24. For disambiguation the terms a Method based…. A domain knowledge step is added to improve the results We use Milne system with domain important terms as input the so called GOLD TERM LIST How to choose them? After some analys we realized that longer terms usually more specific. (you can see in the figure that)….. We use CValue that assings more weigh to those term, furthermore the top results important of the domain. Finally a majoritiy vote procedure is done to obtain the final sense involving the different ouputs from the GOLd term list…… A method that uses Milne and Witten Global disambiguation (Milne2008) approach is used to fulfil this task, to which end the Wikiminer Compare Service is used. This service provides a way for disambiguating term pairs using a classifier that takes as features: • The data provided by Wikipedia. Wikipedia provides statistics about how an article label is associated to a sense/meaning. For example, 55% of “Java” labels refer to the programming language whereas 15% of them refer to the Indonesian island. These statistics yield three features for the classifier: the average, maximum and minimum prior probabilities of the two concepts. • The semantic relatedness between the concepts. The relatedness score can be computed using the links of the articles as features. Milne2013 claim that “Wikipedia articles reference each other extensively, and at first glance the links between them appear to be promising semantic relations. Unfortunately, the article also contains links to many irrelevant concepts (e.g. terms not related to the domain of the analyzed book). Therefore, an individual link between two Wikipedia articles cannot be trusted”. There are different possibilities for computing the relatedness measure, for instance, using the article in-links (those inside the article and refer to other articles). Both measures use different sets of links. The normalized distance measure is based on an approach that looks for documents that mention the terms of interest, and has been adapted to use the links made to articles. The vector similarity measure is based on an approach that looks for terms mentioned within two documents of interest, and has been adapted to use the links contained within articles. However, there is no reason why each measure should not be applied to the other link direction. Thus, each of the measures described above yields two features, one for in-links and the other for out-links. Finally, another measure taking into account the link counts for each article could be used. Different configurations have been tested. As pointed out by Milne2013, the more features used, the higher the performance is. Therefore, the measure that combined the links-in, links-out and link-counts was selected for computing the relatedness score. the term size in n-grams (number of words composing the term) increases. Therefore, the more n-grams a term has, the more specific it is. Nevertheless, domain relevant terms are required. Hence, the monosemic terms with highest CValue score are chosen for the gold term list. This
  25. After having disambiguated all the terms we will try to filter those not related with the domain in a filtering step
  26. In this step, those terms which are not related to the domain are deleted. In this case the we use Astronomy domain r For this task, the gold term list built in the disambiguation step is used. This task attempts to relate each elicited term with the terms in the gold term list, to which end the Wikiminer Comparing Service has been employed. First it discard those topics below the relatedness threshold. Then it requires that the defined threshold to be passed for more than N GOLD TOPICS _________________________________________________________________________ First, the Wikiminer Comparing Services computes each term domain-relatedness. Those topics whose score is below the threshold are dropped. Finally, those terms which are related with at least the minimum amount of gold terms are selected. the candidate term to be related with at least one of the gold term list Therefore, this is the set-up that achieves the best compromise between recall and precision.
  27. The final step of Litewi entails mapping the terms to other languages, using Wikipedia information we obtain those links directly whenever they are available
  28. LiTeWi has been evaluated using Gold Standard and Expert validation, The Gold standard has been based on the terms appearing on the textbooks, Litewi has been tuned up with…. And evaluated on…… ________________________________________________________________________________________________________________________ The first book used for the evaluation is the Introduction to Astronomy (Morison, 2008) textbook. This book consists of 150 pages of plain text and over 110,000 words. The index is composed of 378 unique terms of which 114 are single word terms (1- grams), 189 terms are 2-grams, 57 terms are 3-grams, and 18 terms are 4-grams. 322 (out of 378) of the index terms were related to one or more Wikipedia articles.That is to say, 85.18% of the terms refer to at least one Wikipedia article, such a proportion being the best recall achievable. The second book used for the evaluation is the Introduction to Molecular Biology (Raineri, 2010). This book consists of 139 pages of plain text with over 70,000 words. The index is composed of 274 unique terms of which 116 are single word terms, 119 of them 2-grams, 35 3-grams, 3 4-grams, and 1 5-gram. For this textbook, 220 out of 274 of the index terms were related to one or more Wikipedia articles. Hence, the best achievable recall is 81.30%
  29. In this table we can see the general results LiteWi, we have also tested the results of each step but those are out of scope for this presentation. General better results We can see that we have quite good results in the different domains this can be related to use different algorithms and to use Wikipedia being it multidomain. The difference is specially remarkable in Recall----
  30. Well after finishing with the acquisition of Multilingual terminology our focus is with another part of LiDom…. The part that takes care of pedagogical relationships
  31. In Dom Sorze the proceess of acquiring relationships is divided in two parts. One the one hand, there is an heuristic process for getting relations from the outline. On another hand, for processing the whole textbook , one algorithm is used to extract relations from the whole textbook only for basque An extension has been developed for the outlines where the process is generalized for English and then improved For processing the texttbook a new tool called LiReWi has been developed he identification of pedagogical relationships has been addressed: DOM-Sortze approach for getting outlines has been extended. Generalization for English + evaluation. Improve its knowledge acquisition results using Wikipedia. For the whole textbook analysis, a new tool has been designed that improves the acquired knowledge.
  32. First we will focus on the outline process…. Why? Dom-Sortze uses and heuristic process for procesing otuline Each outline item is considered as a domain topic Where by default partOf relations are identifiend, then isA relations are refined using heuristics. (se detecto falta the domain knowledge) Detected faulty isA identification … lack of domain knowledge for example detecting diseases In LIDOM First an extension ….. Wikitaxonomy,Ponzetto (2007), derived a large scale taxonomy containing isA relationships from Wikipedia. Ir order to deal with the lack of knowledge Each index item is considered as a domain topic. ________________________________________________________________ The structure of the document outline is used as a means to gather pedagogical relationships. A subitem of a general topic is used to explain part of it or a particular case of it. Different heuristics can be fired together in the same group of subitems so, the most confident one is returned. The default heuristic (partOf), is returned when no other heuristic condition is met. Some of those heuristics rely on Natural Language Processing (NLP) services, for instance, those to identify entity names. The outline analysis process consists of two phases: In the basic analysis, the main topics of the domain and the relationships between these topics are mined from the outline. In the heuristic analysis the results of the basic analysis are refined based on a set of heuristics that categorize the relationships . The heuristics entail the condition to be matched, and the post-condition, i.e., the relationships that are recognized. Group heuristics identify relationships from homogeneous subitems or if the outline item entails certain keywords. Individual heuristics are tested on every subitem in the case no Group heuristic is fired.
  33. Is this algorithm we want to refine false partOf using the domain knoweldge contained wikipedia. First we identify siblings nodes… Then we categorize them with ponzetos taxonomy. process every group using Ponzetto and Strube’s taxonomy [15] to look for common ancestor infer isA relationships in those groups that share a common ancestor, as long as it does not appear at top-levels in the taxonomy Queremos Refinar falsos part of en isa con specifidad del dominio, buscar siblings, categorizarlo con el ponzetto, y aquellos que ay un grupo los mapeas con el padre para formar un isa Identify groups of sibling nodes (topics) of the LDO extracted from the outline; select the groups of leave nodes in which the partOf relationship has been identified to apply the subsequent steps; link every node to those Wikipedia articles which are labeled with the normalized text of the node; run a disambiguation process based on Wikiminer to map each node to a unique article; ?¿ mirar process every group using Ponzetto and Strube’s taxonomy [15] to look for common ancestor infer isA relationships in those groups that share a common ancestor, as long as it does not appear at top-levels in the taxonomy. the nodes (removing plural marks, apostrophes and avoiding case differences);
  34. 57 outlines from textbooks of different courses and domains have been processed. Human (hiuman) Gold-standard approach, manually defined LDOs that were used as optimal output. LDOs were restricted to the topics referred on the outlines and the structural relationships between those topics. -“isA” relationship: “Earth is a planet”. -“partOf” relationship: “Earth belongs to the Solar System” A total of 1197 partOf, 483 isA relations evaluated.
  35. In the next point we are going to depicts the results, The lack of knowledge on certain domains significantly affected the performance. For instance, it was observed that many of the topics involved in the missing isA relationships contained proper names; however, the entity name recognizer used in the experiment was unable to identify them. Using the Wikipedia enhanced process. That the results are quite similar, nevertheless, In regards to isA relationships, the recall has dramatically increased from 21.20% to 50.53% whereas the precision was hardly affected (77.30% vs. 78.95%). Let’s move o The overall performance has improved (87.70% precision and recall). Regarding partOf relationships, the recall has slightly decreased (96.49% vs. 98.66%) but the precision has slightly increased from 84.12% to 89.12%. In regards to isA relationships, the recall has dramatically increased from 21.20% to 50.53% whereas the precision was hardly affected (77.30% vs. 78.95%).
  36. Well, next I am going to talk about the tool that I have designed to deal with relationship identification processing the whole textbook. Regarding the elicitation of relationships from the document body…… LiReWi: a Relationship Extractor for Educational Ontologies from whole documents.
  37. In order to map the terms or topics from Wikipedia to WordNet the following process is carried out. - First we have taken two works that have already addressed this task, then we have compared the obtained results if their ouputs agree the system retruns that identifier. - Whether those results are different a page rank disambiguation step is done. For that we employ UKB by Aguirre and Soroa. As we want to disambiguate a disambiguation context is needed. The followed procedure is similar than the one used in LiTeWi but in this case we will require that the topics to have only one sense in WordNet. After the procedure is done the WordNet identifier is returned. thmapper looks first for the appropriate equivalent synset in those mappings identified in BabelNet Project Navigli2012, and also in those mappings discovered by Fernando2013. If the same synset is found in both cases, the mapper assumes that there are no ambiguity problems and returns the identified synset. Otherwise, a disambiguation process is carried out to identify which of the candidate synsets is the appropriate one. To this end, a Page Rank Mapping Disambiguation step is carried out using UKB (Aguirre:EACL:2009), a tool for Word Sense Disambiguation and for determining lexical similarity using a pre-existing knowledge base such as Wikipedia or WordNet. UKB requires a context to fulfil its goal. The context is obtained from the topics extracted by LiTeWi along with the domain relatedness LiTeWi assigned to each of them. The topics with highest domain relatedness score and with a unique meaning in WordNet constitute the context that allows choosing the synset for the topic. In the example of Figure 5.4, the mapped synsets returned by Navigli2012 and Fernando2013 mappings are different. Therefore, the Page Rank Mapping Disambiguation step is carried out to determine the final synset of syntax in WordNet. The context used in the example entails topics such as Programming, Menu bar and Java. The Page Rank Mapping Disambiguation mechanism could select a different synset from those proposed by Navigli2012 and Fernando2013.
  38. LiReWi: a Relationship Extractor for Educational Ontologies from whole documents.
  39. To elicit the pedagogical relationships between the domain topics, LiReWi follows the procedure shown in Figure 5.3. First, all the topics are mapped to the diverse knowledge bases (e.g. Wikipedia, WordNet and others derived from both) that will be used to identify the relationships. Then, several relationship extractors, each using a different approach, are concurrently run to elicit candidate relationships. Finally, the results are combined and filtered to obtain the final set of pedagogical relationships. In the next subsections, each step is described in more detail. Again, LiReWi has been firstly tested on the Principles of Object-Oriented Programming (Wong2010) in order to determine its optimal set-up and, then, evaluated on the Introduction to Astronomy (Morison2008) textbook. To extract pedagogical relationships between topics, LiReWi uses, in addition to shallow parsing techniques, several knowledge bases such as Wikipedia, WordNet, WikiTaxonomy, WibiTaxonomy and WikiRelations. To this end, it is necessary to map every topic to its corresponding entries in those knowledge bases. The topics identified by LiTeWi are already mapped and disambiguated to Wikipedia articles; WikiTaxonomy, WikiRelations and WibiTaxonomy are based on Wikipedia articles. However, to be able to use WordNet, the topics must still be mapped to WordNet entries. WordNet organizes words (nouns, verbs, adjectives and adverbs) into cognitive synonyms called synsets. Each synset refers to a distinct concept that can be referred to using different forms. Navigli2012 and Fernando2013 faced a similar problem and defined the mappings or equivalences between Wikipedia articles and Wordnet synsets.
  40. Candidate Relationship Extraction /// Path based extractors!!! WordNet (Fellbaum1998) can be considered as a huge graph of topics connected by semantic relationships. WibiTaxonomy Extractor: WibiTaxonomy (Flati, 2014) is a knowledge base that comprises two interconnected taxonomies. WikiTaxonomy Extractor … coger los bloques de la anteiror Taxonomia del recurso, mapeo a la taxoonimia Busco un camino ,,,, Hyperoniima isa Meronomiia Paths of limited length to infer the relationships. Confidence depending the path length. The Wikipedia article taxonomy and the category taxonomy. Extracting relationships from WibiTaxonomy entails two steps. First, each topic is mapped to the articles/category taxonomy using the mapped Wikipedia article of each topic. Paths of a limited length to infer the relationships from both articles/categories taxonomies The WikiTaxonomy (Ponzetto & Strube, 2007) is a huge taxonomy derived from the Wikipedia category system where all the links between categories are represented by isA relationships. Moreover, WikiTaxonomy contains a dictionary where the articles are mapped to the corresponding category entries in the taxonomy. First, each topic is mapped to its corresponding WikiTaxonomy categories Then, a DFS search is carried out to find the shortest upwards path between the topics considering the categories in the WikiTaxonomy Search limited in length.
  41. WikiRelations (Nastase, 2008) knowledge base comprises a big set of tuples between Wikipedia categories containing several kinds of relationships. In this work, only the subset of tuples containing isA or partOf relationships has been employed. Map directly each topic to its corresponding topic. Confidence based on the number of tuples containing that relation. Map topics to their correspondent categories.
  42. In this slide the Shallow Parsing Grammar extractor is depicted. This extractor is based on Larrañagas Work for basque…. The following procedure is done by the algorithm……. First…..
  43. This extractor aims to elicit sequential relationships such as prerequisite and pedagocallyClose. The Sequential Extractor uses the information contained in the processed textbook along with information gathered from Wikipedia to extract these kinds of relationships. In particular, it uses the co-occurrences of the topics within the sentences along with the Wikipedia link structure between articles. To use the information of the link structure between articles, this module uses WikiMiner (Milne and Witten, 2013). Next, the procedure is described (see Figure 5.15). First, as occurs in the Shallow Parsing Grammar Extractor, the extractor identifies the topics that are being referred in the text. Once again, the system applies a simple matching algorithm where the compound terms have prevalence over the simple ones. The output of this process is a list of sentences that contain mentions of the input topics. Next, for each of those sentences, a reference relationship is defined between each pair of topics appearing in the sentence if the first topic refers to the second. A topic is considered to refer to another if a link out from the first topic to the second exists in Wikipedia with a relatedness score beyond an empirically gathered threshold. LiReWi uses WikiMiner to compute the relatedness score of two topics. For example (see Figure 5.16) topic1 and topic2, which have links in both directions in Wikipedia, appear in the same sentence. As their relatedness is higher than the empirically determined threshold (0.7) a link between them is annotated. Topic3 only references Topic2, but their relatedness is below the threshold. Finally, for each linked topic pair, a sequential relationship is inferred. If the links between both topics are balanced, i.e., the number of links from the first topic to the second is similar to the number of links from the second to the first, a peda- gogicallyClose relationship between both topics is inferred. Otherwise, a prerequisite relationship is inferred from the topic with the highest number of outgoing links to the topic with higher incoming links. Figure 5.17 shows two examples in which a pedagogicallyClose and a prerequisite relationships are inferred using this procedure. The confidence of the extracted relationships is calculated using the Formula 5.4, where b is the base confidence (0.6), top1m is the number of links from the first topic, top2m is the number of links from the second topic and low is the threshold determining the minimum number of links for a relationship to be inferred, 2 in this case
  44. Here we can see on the left par that a pedagogical relatonship is inferred where the mentions are balanced. On the other hand, whether the mentions are not balanced a prerequisite relationship is inferred….. -------------------------------------------------------------------------------------------------------------- This extractor aims to elicit sequential relationships such as prerequisite and pedagocallyClose. The Sequential Extractor uses the information contained in the processed textbook along with information gathered from Wikipedia to extract these kinds of relationships. In particular, it uses the co-occurrences of the topics within the sentences along with the Wikipedia link structure between articles. To use the information of the link structure between articles, this module uses WikiMiner (Milne and Witten, 2013). Next, the procedure is described (see Figure 5.15). First, as occurs in the Shallow Parsing Grammar Extractor, the extractor identifies the topics that are being referred in the text. Once again, the system applies a simple matching algorithm where the compound terms have prevalence over the simple ones. The output of this process is a list of sentences that contain mentions of the input topics. Next, for each of those sentences, a reference relationship is defined between each pair of topics appearing in the sentence if the first topic refers to the second. A topic is considered to refer to another if a link out from the first topic to the second exists in Wikipedia with a relatedness score beyond an empirically gathered threshold. LiReWi uses WikiMiner to compute the relatedness score of two topics. For example (see Figure 5.16) topic1 and topic2, which have links in both directions in Wikipedia, appear in the same sentence. As their relatedness is higher than the empirically determined threshold (0.7) a link between them is annotated. Topic3 only references Topic2, but their relatedness is below the threshold. Finally, for each linked topic pair, a sequential relationship is inferred. If the links between both topics are balanced, i.e., the number of links from the first topic to the second is similar to the number of links from the second to the first, a peda- gogicallyClose relationship between both topics is inferred. Otherwise, a prerequisite relationship is inferred from the topic with the highest number of outgoing links to the topic with higher incoming links. Figure 5.17 shows two examples in which a pedagogicallyClose and a prerequisite relationships are inferred using this procedure. The confidence of the extracted relationships is calculated using the Formula 5.4, where b is the base confidence (0.6), top1m is the number of links from the first topic, top2m is the number of links from the second topic and low is the threshold determining the minimum number of links for a relationship to be inferred, 2 in this case
  45. LiReWi: a Relationship Extractor for Educational Ontologies from whole documents.
  46. Mas confidencia ++ extractores se saca que…… conflict fuero
  47. has been evaluated using two approaches, Gold-standard and expert validation. This time, LiReWi has firstly been tuned up on the Principles of Object-OrientedProgramming (Wong2010) in order to determine its optimal set-up, and subsequently, evaluated on the Introduction to Astronomy (Morison2008) textbook. Para eso los 199 topicos de LITEWI…. First, an evaluation of the mapping techniques is depicted. Then, the evaluation of the candidate relationship extraction is presented and, finally, the evaluation of the combination and filtering is described Gold standard For the Gold standard evaluation, four expert stated the set of relationships .
  48. 1. This presentation is organised in ve main sections. 2. First, the context and motivation for this work will be presented in the Introduction. 3. Next, our proposal will be described, focusing on both the process carried out and the framework that has been developed for that purpose. 4. Following, the evaluation conducted to validate our proposal will be shown. 5. Finally, the conclusions and future lines identied will be depicted
  49. In Dom Sorze the proceess of acquiring relationships is divided in two parts. One the one hand, for processing the outline. On another hand, for proccesing the whole textbool , one algorithm is used to extract relations from the whole textbook only for basque An extension has been developed for the outlines where the process is generalized for English. For processing the texttbook a new tool called LiReWi has been developed he identification of pedagogical relationships has been addressed: DOM-Sortze approach for getting outlines has been extended. Generalization for English + evaluation. Improve its knowledge acquisition results using Wikipedia. For the whole textbook analysis, a new tool has been designed that improves the acquired knowledge.
  50. DR Grammar was evaluated by analyzing the atomic LOs. (decir de voz y extraido los LOs=)
  51. For the DR Grammar Similar results conducted experiments over textbooks in Basque (Larrañaga 2012a). Lower accuracy for problem statements (imperative cases difficult to detect in English). For the Learning Objects we
  52. Evaluating the quality of the content in Wikipedia and WordNet is beyond the scope of this experiment. Therefore, the experiment here conducted consisted of measuring how much the use of LiLoWi enhanced the LO coverage for the LDO topics, and how many multilingual Los were elicited. Is assumed that all the definitions and LOs extracted from Wikipedia and Wordnet are correct Therefore, the experiment here conducted consisted of measuring how much the use of LiLoWi enhanced the LO coverage for the LDO topics, and how many multilingual Los were elicited.
  53. Here we can use the results of the evaluation for LiLoWi against Dom Dortze in the previously described evaluation. LiLoWi outperforms Dom-Sortze in term coverage for Los by a margin of the 25% against Dom Sortze. Moreover we gain from references.
  54. For the final part of the presentation, some conclusions and future work will be presented
  55. Provision In particular, a OWL representation of multilingual learning ontologies... LITEWI is the module responsible for the elicitation of multilingual terms for Educational Ontologies from electronic documents. It combines different approaches such as TF-IDF, KP-Miner, CValue and Shallow Parsing Grammar for the unsupervised term extraction using Wikipedia as a knowledge base. The approach carried out by LiTeWi entails three main steps: the identification of the topic candidates; the combination and the refinement of the results to obtain the set of terms; and, finally, the mapping of the terms to other languages in Wikipedia. LiReWi (CondeLiReWi) is the module that implements a method for the elicitation of pedagogical relationships for Educational Ontologies from electronic document bodies. It combines shadow parsing techniques in addition to several knowledge bases such as Wikipedia, WordNet, WikiTaxonomy, WibiTaxonomy and WikiRelations to elecit isA, partOf, prerequisite and pedagogicallyClose relationships. LiReWi also performs a three-step procedure to fulfil its task: first, all the topics are mapped to the diverse knowledge bases that will be used to identify the relationships; then, several relationship extractors, each using a different approach, are concurrently run to elicit candidate relationships; and, finally, the results are combined and filtered to obtain the final set of pedagogical relationships. In LiDom Builder the process of eliciting structural relationships (isA, partOf ) from document outlines has also been enhanced, with the inclusion of Wikipedia as an additional resource (Conde2014). LiLoWi is the module that enables the elicitation of new LOs, including some multilingual LOs, from both the original textbook body and different knowledge bases such as Wikipedia or WordNet. Once each topic of the LDO is mapped 7.2 Future Research Lines 103 to Wikipedia and WordNet, LiLoWi retrieves the information from those two resources using their corresponding LO Extractors. Before incorporating Wikipedia and WordNet to the LO acquisition process, the validity of the proposal presented in LiDom Builder to incorporate the English language has also been considered and tested (Conde2012). Although the modular design of LiDom Builder facilitates the inclusion of a new language, some resources must be defined, in particular the heuristics and the grammars that allow the knowledge elicitation and the Discourse Markers for that language. Automatising the development of such kinds of resources will remarkably reduce the workload in the integration of a new language. In the last few years, great advances have been made in Machine Translation. The research in that field might help to semi-automatically develop the grammars and heuristics for a new language from those already defined for a particular language. Furthermore, similar structures or equivalent patterns have been observed in the supported languages. Therefore, a meta model describing the generic patterns could be defined and rule-based transformations applied to obtain the specific grammars and heuristics for a particular language. LiDom Builder could try to identify LOs that are equivalents or translations in other languages. To this end, different means will be explored. For example, • Latent Semantic Analysis (LSA) would be used to generate a model of each LO, and this model would be translated using Machine Translation techniques to obtain its equivalents in other languages. If a similar model were found for the translated model, then the equivalence between their corresponding LOs would be inferred. • Additionally, another Machine Translation based approach might be also explored. To determine if two LOs, say LO1 in English and LO2 in French, are equivalent, LiDom Builder could take advantage of Machine Translation techniques by generating their automatic translations before comparing them. If the translated LO1 (LOt1) were similar to LO2, or the translation of LO2 (LOt2) were similar to LO1, they could be considered equivalent. Diverse similarity and text reuse metrics would be tested in this approach.
  56. For Example adding new algorithms and techniques from the NLP domain, or for example using more advanced techniques for filtering unwanted acquired knowledge. Furthermore, similar structures or equivalent patterns have been observed in the supported languages. Therefore, a meta model describing the generic patterns could be defined and rule-based transformations applied to obtain the specific grammars and heuristics for a particular language. Furthermore, given the multilingual nature and the layout of Wikipedia, LiDom Builder is able to generate multilingual definitions from Wikipedia. Using ErauzOnt on Wikipedia, or other additional resources, would allow the identification of additional monolingual LOs. To generate multilingual LOs from these resources, two different approaches could be applied. LiDom Builder could try to identify LOs that are equivalents or translations in other languages. To this end, different means will be explored. For example, It can be observed that its representation is not far from the representation used in LiDom Builder to visualize the LDO, i.e, the Learning Domain Ontology. The elicitation of new types of relationships in LiDom Builder, relationships different from the currently identified pedagogical relationships, would allow the automatic generation of concept maps related to the domain considered in the textbook that LiDom Builder used as a source. The concept maps, along with their localised views, would constitute a new kind of multilingual LOs.
  57. Se pueden ampliar,….. Litewi por ejm spanish y ademas mas algoritmos ....
  58. These are main thesis publications related to this thesis ======================================================= There is another one under revision and finally I would like to publish another one resuming all the process.
  59. We have come to the end of the presentation. I’d just like to thank(thenk) you for listening and would be pleased to take your comments and questions now.
  60. For Example adding new algorithms and techniques from the NLP domain, or for example using more advanced techniques for filtering unwanted acquired knowledge. Furthermore, similar structures or equivalent patterns have been observed in the supported languages. Therefore, a meta model describing the generic patterns could be defined and rule-based transformations applied to obtain the specific grammars and heuristics for a particular language. Furthermore, given the multilingual nature and the layout of Wikipedia, LiDom Builder is able to generate multilingual definitions from Wikipedia. Using ErauzOnt on Wikipedia, or other additional resources, would allow the identification of additional monolingual LOs. To generate multilingual LOs from these resources, two different approaches could be applied. LiDom Builder could try to identify LOs that are equivalents or translations in other languages. To this end, different means will be explored. For example, It can be observed that its representation is not far from the representation used in LiDom Builder to visualize the LDO, i.e, the Learning Domain Ontology. The elicitation of new types of relationships in LiDom Builder, relationships different from the currently identified pedagogical relationships, would allow the automatic generation of concept maps related to the domain considered in the textbook that LiDom Builder used as a source. The concept maps, along with their localised views, would constitute a new kind of multilingual LOs.
  61. La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
  62. La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
  63. La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
  64. La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
  65. La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
  66. For enhancing the identification of the pedagogical relationshis La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
  67. For enhancing the identification of the pedagogical relationshis La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
  68. Conflict is found between Earth isA Planet and Planet isA Earth proposals. The system looks at the link structure of the topic in Wikipedia, along with the confidence of the extracted relationships, to determine the final relationship. In the figure, Earth isA Planet has higher confidence than Planet isA Earth. In addition, Earth has a link to planet in Wikipedia, whereas “Planet” does not have a link to “Earth”..
  69. In the next point we are going to depicts the results, The lack of knowledge on certain domains significantly affected the performance. For instance, it was observed that many of the topics involved in the missing isA relationships con- tained proper names; however, the entity name recognizer used in the experiment was unable to identify them. A training process would be necessary to fulfill such purpose. The overall performance has improved (87.70% precision and recall). Regarding partOf relationships, the recall has slightly decreased (96.49% vs. 98.66%) but the precision has slightly increased from 84.12% to 89.12%. In regards to isA relationships, the recall has dramatically increased from 21.20% to 50.53% whereas the precision was hardly affected (77.30% vs. 78.95%).
  70. Table 5.11 shows the results of the evaluation of the mapping step in the . BabelNet approach led to the highest precision 100%, but its recall was the lowest with only 14.73%. Fernando’s method, on the other hand, led to 83.33% precision with 18.42% recall. Our approach, which combines both methods with UKB, results in 97.82% precision and 23.68% recall, showing that it greatly increases the recall while minimizing the loss on precision. The F1-score is also shown in the table. Our approach combining both Babelnet & Fernando’s approaches with UKB (Aguirre, 2009).