The document describes a framework called LiDom Builder that aims to automate the construction of multilingual domain modules. LiDom Builder enhances an existing framework called DOM-Sortze, which was limited to single language domain modules, by extending the formalism to represent multilingual domain modules and developing tools to extract multilingual terminology, identify pedagogical relationships, and generate multilingual learning objects from textbooks. The evaluation involves applying LiDom Builder to sample textbooks and measuring the accuracy of extracted terminology, relationships, and learning objects against gold standards. Results show LiDom Builder achieves better performance than existing tools.
Yannis Dimitriadis: Interweaving learning and assessment patterns in CSCL scr...Yishay Mor
http://link.lkl.ac.uk/Dimitriadis
Interweaving learning and assessment patterns in CSCL scripts Print
Thursday 25 June 2009, 12:00am - 2:00pm
Patterns and macro-scripts for supporting teachers with learning design
Prof. Yannis Dimitriadis, University of Valladolid
Location: Large Seminar Room
Learning design or scripting has drawn considerable attention in the field of CSCL (Computer Supported Collaborative Learning). Such an interest draws on research in flexible scaffolding of complex collaborative situations as well as on parallel research regarding Learning Design.
This talk will address a pattern-based approach to CSCL macro-scripts as a means to support teachers in the Learning Design process. Besides a presentation of prior work on Collaborative Learning Flow Patterns and the WebCollage tool, this talk will describe current research efforts that aim at interweaving learning and assessment
patterns. Finally, it will reflect on issues that may relate the patterns approach with
the Learning Design and Open Educational Resources fields.
LangMOOC project _EMMA Summer School 2015, Ischia, ItalyMaria Perifanou
This is a presentation of the LangMOOC project (Erasmus+) http://www.langmooc.com/ that took place at the EMMA Summer School, Ischia Italy on July 2015. http://project.europeanmoocs.eu/project/get-involved/summer-school/programme/
The aim of the project is to research the potential of MOOCS in Language Learning, to explore the pedagogical framework of Language MOOCs, to develop a toolkit for the creation and management of Language MOOCs and OERs and to test the use of OERs in language MOOCs in a pilot course.
The LangMOOCs project focus is to step up support for language learning and promote multilingualism via the implementation of Massive Open Online Courses for Language Learning. One of the main outcomes of the project, the Language MOOCs toolkit, will include all the innovative methods and tools for the creation, management and evaluation of MOOCs and OERs for Language Learning. Most of the MOOCs are implemented and run by academic institutions. The LangMOOCs project also aims to non-academic institutions and language teacher and trainers.
INTED 2014 M. PERIFANOU & A. ECONOMIDES
The paper will first present
the requirements for a successful online Language Learning course and then it will continue with the
exploration of the use of MOOCs in Language Education. Next an evaluation of the platforms and the
instructional design used so far for Massive Open Online Language Learning Courses will follow.
Finally, after the presentation of possible concerns and recommendations regarding the Language
Learning MOOCs, there will be a discussion that aims to draw the first conclusions of this research
and share some future research plans.
Exlporing New challenges in TELL: Language Learning MOOCsMaria Perifanou
Invited online lecture about Language MOOCs for Language teachers who are enrolled at the Master Course for CALL at the Language Center of Cyprus University of Technology. (7 November 2015)
Keynote presentation at 'Breaking Barriers – Embracing Literacy through Digital Media organised by the Directorate for Lifelong Learning and Early School Leavers', Valetta, Malta, 30 November-4 December 2015. More info: https://ec.europa.eu/epale/en/content/breaking-barriers-embracing-literacy-through-digital-media
Exploring new challenges in TELL: LangMOOC and Open Education EuropaMaria Perifanou
This is the presentation that I did as invited speaker at the Elearning panel at the 28th Foreign Language Education conference in Thessaloniki on 27th and 28th August 2015 organised by the Panhellenic Federation of Language School Owners.
Project work is "an approach to learning which complements mainstream methods and which can be used with almost all levels, ages and abilities of students" (Haines 1989:1).
Yannis Dimitriadis: Interweaving learning and assessment patterns in CSCL scr...Yishay Mor
http://link.lkl.ac.uk/Dimitriadis
Interweaving learning and assessment patterns in CSCL scripts Print
Thursday 25 June 2009, 12:00am - 2:00pm
Patterns and macro-scripts for supporting teachers with learning design
Prof. Yannis Dimitriadis, University of Valladolid
Location: Large Seminar Room
Learning design or scripting has drawn considerable attention in the field of CSCL (Computer Supported Collaborative Learning). Such an interest draws on research in flexible scaffolding of complex collaborative situations as well as on parallel research regarding Learning Design.
This talk will address a pattern-based approach to CSCL macro-scripts as a means to support teachers in the Learning Design process. Besides a presentation of prior work on Collaborative Learning Flow Patterns and the WebCollage tool, this talk will describe current research efforts that aim at interweaving learning and assessment
patterns. Finally, it will reflect on issues that may relate the patterns approach with
the Learning Design and Open Educational Resources fields.
LangMOOC project _EMMA Summer School 2015, Ischia, ItalyMaria Perifanou
This is a presentation of the LangMOOC project (Erasmus+) http://www.langmooc.com/ that took place at the EMMA Summer School, Ischia Italy on July 2015. http://project.europeanmoocs.eu/project/get-involved/summer-school/programme/
The aim of the project is to research the potential of MOOCS in Language Learning, to explore the pedagogical framework of Language MOOCs, to develop a toolkit for the creation and management of Language MOOCs and OERs and to test the use of OERs in language MOOCs in a pilot course.
The LangMOOCs project focus is to step up support for language learning and promote multilingualism via the implementation of Massive Open Online Courses for Language Learning. One of the main outcomes of the project, the Language MOOCs toolkit, will include all the innovative methods and tools for the creation, management and evaluation of MOOCs and OERs for Language Learning. Most of the MOOCs are implemented and run by academic institutions. The LangMOOCs project also aims to non-academic institutions and language teacher and trainers.
INTED 2014 M. PERIFANOU & A. ECONOMIDES
The paper will first present
the requirements for a successful online Language Learning course and then it will continue with the
exploration of the use of MOOCs in Language Education. Next an evaluation of the platforms and the
instructional design used so far for Massive Open Online Language Learning Courses will follow.
Finally, after the presentation of possible concerns and recommendations regarding the Language
Learning MOOCs, there will be a discussion that aims to draw the first conclusions of this research
and share some future research plans.
Exlporing New challenges in TELL: Language Learning MOOCsMaria Perifanou
Invited online lecture about Language MOOCs for Language teachers who are enrolled at the Master Course for CALL at the Language Center of Cyprus University of Technology. (7 November 2015)
Keynote presentation at 'Breaking Barriers – Embracing Literacy through Digital Media organised by the Directorate for Lifelong Learning and Early School Leavers', Valetta, Malta, 30 November-4 December 2015. More info: https://ec.europa.eu/epale/en/content/breaking-barriers-embracing-literacy-through-digital-media
Exploring new challenges in TELL: LangMOOC and Open Education EuropaMaria Perifanou
This is the presentation that I did as invited speaker at the Elearning panel at the 28th Foreign Language Education conference in Thessaloniki on 27th and 28th August 2015 organised by the Panhellenic Federation of Language School Owners.
Project work is "an approach to learning which complements mainstream methods and which can be used with almost all levels, ages and abilities of students" (Haines 1989:1).
Catalan Model for Language Learning in Plurilingual contextsNeus Lorenzo
Teacher training matterial for developing the Catalan Model for Language Learning in Plurilingual contexts (Jornada d'educació a Europa. Berga 2009, Catalonia)
Automated Classification and Quantification of Verbatims via Machine...Fabrizio Sebastiani
Keynote delivered at the 2013 Conference of the Association for Survey Computing, about automatically classifying open-ended answers and about quantifying their distribution across the codes of interest
Large variance and fat tail of damage by natural disasterHang-Hyun Jo
In order to account for large variance and fat tail of damage by natural disaster, we study a simple model by combining distributions of disaster and population/property with their spatial correlation. We assume fat-tailed or power-law distributions for disaster and population/property exposed to the disaster, and a constant vulnerability for exposed population/property. Our model suggests that the fat tail property of damage can be determined either by that of disaster or by those of population/property depending on which tail is fatter. It is also found that the spatial correlations of population/property can enhance or reduce the variance of damage depending on how fat the tails of population/property are. In case of tornadoes in the United States, we show that the damage does have fat tail property. Our results support that the standard cost-benefit analysis would not be reliable for social investment in vulnerability reduction and disaster prevention.
http://ascelibrary.org/doi/abs/10.1061/9780784413609.277
http://arxiv.org/abs/1407.6209
Keynote delivered at the 2013 Workshop on
Using Predictive Coding in E-Discovery (DESI V), about minimizing the cost of human review following an automated classification pass
How to set up Gmail to Send and Receive Emails from your Web Hosting EmailNatasha Rivera
This is a tutotial that explains how you can set up gmail so you can send and receive emails from your domain email without buying google apps for work.
Visual data-enriched design technology for blended learningLaia Albó
Presentation at Tallinn University.
Archimedes Foundation fellow - Research visit during 3 months at TLU.
Learning analytics is the most known type of data collected from specific technological environments that allow educators to evaluate how students are learning within a learning context. However, there are more types of data available, less-explored, that may contribute to better design educational practices. These include design analytics, which are the metrics of design decisions and related aspects that inform learning designs. Laia Albó, from Universitat Pompeu Fabra, will talk about how visual representations, authoring support, and design analytics can aid teachers in designing for learning in complex scenarios that blend the use of different spaces for learning and different types of technological tools and resources, e.g. Massive Open Online Courses. This presentation is based on her PhD thesis work, defended in November 2019.
Catalan Model for Language Learning in Plurilingual contextsNeus Lorenzo
Teacher training matterial for developing the Catalan Model for Language Learning in Plurilingual contexts (Jornada d'educació a Europa. Berga 2009, Catalonia)
Automated Classification and Quantification of Verbatims via Machine...Fabrizio Sebastiani
Keynote delivered at the 2013 Conference of the Association for Survey Computing, about automatically classifying open-ended answers and about quantifying their distribution across the codes of interest
Large variance and fat tail of damage by natural disasterHang-Hyun Jo
In order to account for large variance and fat tail of damage by natural disaster, we study a simple model by combining distributions of disaster and population/property with their spatial correlation. We assume fat-tailed or power-law distributions for disaster and population/property exposed to the disaster, and a constant vulnerability for exposed population/property. Our model suggests that the fat tail property of damage can be determined either by that of disaster or by those of population/property depending on which tail is fatter. It is also found that the spatial correlations of population/property can enhance or reduce the variance of damage depending on how fat the tails of population/property are. In case of tornadoes in the United States, we show that the damage does have fat tail property. Our results support that the standard cost-benefit analysis would not be reliable for social investment in vulnerability reduction and disaster prevention.
http://ascelibrary.org/doi/abs/10.1061/9780784413609.277
http://arxiv.org/abs/1407.6209
Keynote delivered at the 2013 Workshop on
Using Predictive Coding in E-Discovery (DESI V), about minimizing the cost of human review following an automated classification pass
How to set up Gmail to Send and Receive Emails from your Web Hosting EmailNatasha Rivera
This is a tutotial that explains how you can set up gmail so you can send and receive emails from your domain email without buying google apps for work.
Visual data-enriched design technology for blended learningLaia Albó
Presentation at Tallinn University.
Archimedes Foundation fellow - Research visit during 3 months at TLU.
Learning analytics is the most known type of data collected from specific technological environments that allow educators to evaluate how students are learning within a learning context. However, there are more types of data available, less-explored, that may contribute to better design educational practices. These include design analytics, which are the metrics of design decisions and related aspects that inform learning designs. Laia Albó, from Universitat Pompeu Fabra, will talk about how visual representations, authoring support, and design analytics can aid teachers in designing for learning in complex scenarios that blend the use of different spaces for learning and different types of technological tools and resources, e.g. Massive Open Online Courses. This presentation is based on her PhD thesis work, defended in November 2019.
IWMW 2002: Interoperability and learning standards briefing, IntroductionIWMW
Web Standards Briefing session at IWMW 2002 event by Lorna Campbell and Neil Sclater.
See http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-2002/materials/sclater/
The Structure and Components for the Open Education EcosystemHans Põldoja
Lectio Praecursoria in the doctoral defense, 23 September 2016. Aalto University School of Arts, Design and Architecture. Helsinki, Finland.
The disseration can be downloaded from https://shop.aalto.fi/media/attachments/748b6/Poldoja_verkkoversio.pdf
Semantic Technologies in Learning EnvironmentsDragan Gasevic
Presentation give at the pre-conference workshop of the 1st International Conference on Learning Analytics and Knowledge, https://tekri.athabascau.ca/analytics/
Semantic Technologies in Learning AnalyticsDragan Gasevic
My presentation at the pre-conference workshop of the 1st International Conference on Learning Analytics and Knoweldge
https://tekri.athabascau.ca/analytics/
Educational Resources for 21st Century SchoolsCITE
4 March 2010 (Thursday) | 09:00 - 10:30 | http://citers2010.cite.hku.hk/abstract/2 | Mr. Jordi VIVANCOS, Head of the Knowledge and Learning Technologies Unit, the Council of Education of Catalonia
Research in Distance Education: impact on practice conference, 27 October 2010. Presentation in Design for Learning Strand by Tim Neumann, London Knowledge Lab.
More details at www.cde.london.ac.uk.
Everything you need to know about MOCC, well most of the things that you would like to know about MOOC, what it is, how it started, the budget and the future predictions about MOOC. it also shows how important MOOC is, the types of MOOC that you can and at the end of the slides I showed what would my MOOC interest be.
Design and orchestration of CSCL educational scenarios is still a challenge for teachers and instructional designers.
Conceptual and technological support to teachers as designers is essential for a sustainable, effective and efficient adoption of innovative pedagogical approaches in increasing complex technology-enhanced learning ecosystems.
This talk presents an overview of patterns, software architectures and environments that support design for learning, drawn from proposals made by the GSIC/EMIC group, together with illustrative examples.
Finally, we discuss some issues regarding effective orchestration actions and pedagogical interventions based on learning analytics and aligned with the design of the educational scenarios.
The Multi-Faceted Focus of International Collaborationslamericaana
This was a talk I gave at the COIL Conference at Purchase College SUNY, NY on Nov 14, 2008. It discusses the importance of considering culture and collaboration when designing international collaborations and details what needs to be considered in the process.
The Multi-Faceted Focus of International Collaborations
Ph.D. Defense
1. LiDom Builder: Automatising the Construction of
Multilingual Domain Modules
Ángel Conde Manjón
GaLan Research Group – LSI Department
University of the Basque Country (UPV/EHU)
Supervisors:
Dr. Mikel Larrañaga Olagaray & Dr. Ana Arruarte Lasa
UPV/EHU
25 February 2016
2. • Technology Supported Learning Systems (TSLS)
• Learning Management Systems:
• Massive Open Online Courses:
• Intelligent Tutoring Systems: SQL-Tutor
• …
• Bilingual and Multilingual Contexts are a reality (Unesco, 2003)
• Acquiring the Domain Module is a cost and work intensive
task
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Context
2
4. 4
DOM-Sortze (Larrañaga, 2012) a framework for building DOMAIN MODULES from
electronic textbooks
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Previous Work: DOM-Sortze
5. 5
Electronic Textbook
LDO Gathering
Preprocess
LOs Gathering
Domain Module
Document Body Internal
Representation
Document Outline Internal
Representation
Learning Domain Ontology
1
2
3
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Previous Work: DOM-Sortze
6. 6
Planetary
System
Solar System
Moon
Satellite
Planet Earth
partOfpartOf
partOf
isA
isA
prerequisite
The Moon is Earth's
only natural satellite
LO1
hasDR
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
DOM-Sortze: Domain Module Representation Formalism
Learning Domain Ontology (LDO)
Topics and pedagogical relationships
Learning Objects (LO)
• Definitions
• Examples
• Problem Statements
• …
7. Limitations of DOM-Sortze:
1. Developed for a single language: Basque.
2. Its formalism is not able to represent Multilingual Domain
Modules.
7
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
DOM-Sortze: Limitations
8. 8
1. Can be the formalism used in DOM-Sortze be enhanced for
Multilingual Domain Modules?
– Extend the formalism to deal with Multilingual Domain Modules.
2. Which enhancements are required to deal with various languages?
– Develop a method for extracting Multilingual Terminology.
– Improve the Relationship Acquisition.
– Provide a method for acquiring Multilingual Learning Objects.
Automatising the construction of MULTILINGUAL DOMAIN MODULES
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Goals
9. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
9
I. Introduction: Motivations and Goals
II. LiDom Builder: Building Multilingual Domain
Modules
III. Acquisition of Multilingual Terminology
IV. Identification of Pedagogical Relationships
V. Gathering Multilingual Learning Objects
VI. Conclusions and Future Work
Outline
10. 10
I. Introduction: Motivations and Goals
II. LiDom Builder: Building Multilingual Domain
Modules
III. Acquisition of Multilingual Terminology
IV. Identification of Pedagogical Relationships
V. Gathering Multilingual Learning Objects
VI. Conclusions and Future Work
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future WorkLiDomBuilder
Outline
12. Equiv. “en”
Equiv. “es”
12
Planetary
System
Solar System
Moon
Satellite
Planet Earth
partOfpartOf partOf
isA
isA
prerequisite
pedagogically
Close
“ilargi”
“luna”
“moon”
LO1 LO2
eu
en
es
hasDR hasDR
@
@ @
@
@
@
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future WorkLiDom Builder
Multilingual Domain Module Formalism
13. Language
Identification
LDO
Gathering
13
Electronic Textbook
Preprocess
LOs Gathering
Document Internal
Representation
Document Outline Internal
Representation
1
2
3
Domain Module
Learning Domain Ontology
NLP Parsers
Illinois Chunker
Illinois POS tagger
FreeLing
IXA-Pipes
Topic Extraction
Relationship Extraction
Set of Heuristics
Grammar
Multilingual LOs
Grammar
Discourse Markers
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future WorkLiDom Builder
Proposed Enhancements
LiTeWi
LiReWi
LiLoWi
0
15. 15
• Two phases
• Tuning up
• Set the thresholds and default confidence values.
• Evaluation
• Gold Standard (Recall, Precision, F1-Score).
• Expert validation.
• Use of three textbooks
1. Programming: Introduction to Object Oriented Programming (Wong .S,
2010).
2. Astronomy: Introduction to Astronomy (Morison, 2008).
3. Biology: Introduction to Molecular Biology (Raineri,2010).
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future WorkLiDom Builder
General Evaluation Methodology
16. 16
I. Introduction: Motivation and Goals
II. LiDom Builder: Building Multilingual Domain
Modules
III. Acquisition of Multilingual Terminology
IV. Identification of Pedagogical Relationships
V. Gathering Multilingual Learning Objects
VI. Conclusions and Future Work
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Outline
17. 17
In DOM-Sortze, terminology extracted with ErauzTerm (Alegria et al., 2004).
A new tool called LiTeWi has been developed.
Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Acquisition of Multilingual Terminology
18. Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
LiTeWi
18
TF-IDF KP-Miner CValue
Shallow Parsing
Grammar
Electronic Textbook
Candidate Extraction
Generic
Corpus
Mapping
Disambiguation
Filtering
Mapping to other languages
Candidate Selection
Combination
19. Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Shallow Parsing Algorithm
19
• Uses a derived grammar from (Larrañaga, 2012).
Constraint
Grammar applied
to POS tags
Shallow Parser
Topics
Array List
Stack
………
Grammar
Topic + [*]+ part of + [det] +Topic
……………….
Textbook
Sentences may contain topics
This is called an Array List
A Stack is used to model systems that exhibit LIFO…
Extraction
Rules
Chunks
an Array List
A Stack
…….
20. Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
LiTeWi
TF-IDF KP-Miner CValue
Shallow Parsing
Grammar
Electronic Textbook
Candidate Extraction
Mapping
Disambiguation
Filtering
Mapping to other languages
Generic
Corpus
Candidate Selection
Combination
20
21. Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Mapping
21
• Terms mapped to their corresponding Wikipedia articles.
• Search procedure to match Wikipedia article titles and their labels.
22. Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
LiTeWi
TF-IDF KP-Miner CValue
Shallow Parsing
Grammar
Electronic Textbook
Candidate Extraction
Mapping
Disambiguation
Filtering
Mapping to other languages
Generic
Corpus
Candidate Selection
Combination
22
23. Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Disambiguation
23
• Method based on global disambiguation (Milne et al., 2008).
• Domain knowledge step added to improve the results.
• Use as a disambiguation context the domain important terms.
• Gold Term List: Domain important terms with only one sense.
Monosemic terms that have highest CValue score.
24. Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Disambiguation
24
Wikiminer
Compare Service
Term List (to disambiguate)
-Java
- Inheritance
-Property
Disambiguated Term -Java (programming Language)
Gold Term List
-Class
-Programming Language
-Array List
Class Prog.
Lang.
Array List
Prog. Language 0.90 0.85 0.64
Island 0.7 0.77 0.53
City 0.56 0.75 0.6
Average
0.89
0.70
0.63
-Java
25. Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
LiTeWi
25
TF-IDF KP-Miner CValue
Shallow Parsing
Grammar
Electronic Textbook
Candidate Extraction
Mapping
Disambiguation
Filtering
Mapping to other languages
Generic
Corpus
Candidate Selection
Combination
26. Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Filtering Unwanted Terms
26
Wikiminer
Compare Service
Number of Related Gold
Terms
Gold Term List
-Solar System
- Black Hole
-Solar Mass
Term List (to filter)
-Universal Studios
-Planet
-Windows 98
Relatedness Score
-Planet
-Windows 98
Domain Related Term
-Planet
-Planet
N(>1)
Threshold(>=0.6)
Solar System (0.34)
Black Hole (0.53)
Solar Mass (0.47)
Solar System (0.23)
Black Hole (0.68)
Solar Mass (0.50)
-Universal Studios
-Windows 98
27. Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
LiTeWi
27
TF-IDF KP-Miner CValue
Shallow Parsing
Grammar
Electronic Textbook
Candidate Extraction
Mapping
Disambiguation
Filtering
Mapping to other languages
Generic
Corpus
Candidate Selection
Topic EN ES EU
Moon Moon Luna Ilargia
Combination
28. Introduction
Acquisition of
Multilingual Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Evaluation
28
Tuning up
• Introduction to Object Oriented Programming textbook.
Evaluation
• Gold Standard and Expert Validation.
• Gold Standard based on the terms appearing on the index of each textbook.
• Evaluated on Introduction to Astronomy and Introduction to Molecular
Biology.
30. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Outline
30
I. Introduction: Motivation and Goals
II. LiDom Builder: Building Multilingual Domain
Modules
III. Acquisition of Multilingual Terminology
IV. Identification of Pedagogical Relationships
V. Gathering Multilingual Learning Objects
VI. Conclusions and Future Work
31. Introduction
31
In DOM-Sortze, relationship acquisition for Basque using Shallow Parsing
An adaptation and extension of the Heuristic-based analysis of
the outline has been developed.
A new tool called LiReWi has been developed.
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
32. Heuristic-based analysis of the outline
32
Document Outlines
• Reflects the organization made by the author.
• The structure of the outline underlies pedagogical relationships.
• Low cost process (summarised).
DOM-Sortze
• Each outline item is considered as a domain topic.
• By default gathers a partOf relation between an item and its subitems.
• Heuristics to detect isA relations.
LiDom Builder
• Adaptation to English of heuristics from (Larrañaga et al., 2004).
• Improvement of isA identification using Wikitaxonomy (Ponzetto et al., 2007).
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
33. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Wikipedia Enhanced Process
33
………..
4.- Structure of polymers / Macromolecules
4.1.- Polymer chemistry
4.2.- Molecular weight
4.3.- Form, structure and molecular configuration
4.3.- Supramolecular arrangement
4.4.- Crystalline and amorphous polymers
4.5.- Families of polymeric materials
4.5.1.- Thermosettings
4.5.2.- Thermoplastics
4.5.3.- Elastomers
5.- Phase diagrams / Definitions
5.1.- Solid solutions
5.2.- Phases rule of Gibbs
5.3.- Types of phase diagram
1. Identify groups of sibling nodes
2. Select the groups of leaf nodes in which
the partOf relationship has been
identified
Thermosettings polymer (Article id= 321827)
Thermoplastic (Article id= 182444)
Elastomer (Article id = 842224)
3. Link and disambiguate each
node to a Wikipedia article
using Wikiminer (Milne et al.,
2012)
Materials science
Elastomers
Polymer physics
Polymer physics
Polymer chemistry
4. Process every group using
(Ponzetto et al., 2007) taxonomy
5. Infer isA relationship in those
groups that share a common
ancestor
34. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Evaluation
34
Gold Standard
• 57 document outlines in English from different
domains.
• Human instructors defined the optimal output (LDOs).
• Each LDO restricted to the topics of the outline.
35. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Results
35
• Heuristic Analysis
• Heuristic Analysis + Wikipedia Enhanced Process
partOf isA Total
Precision (%) 84.12 78.95 83.85
Recall (%) 98.66 21.20 83.85
partOf isA Total
Precision (%) 89.19 77.30 87.70
Recall (%) 96.49 50.53 87.70
36. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Identification of Pedagogical Relationships: LiReWi
36
Mapping
Topics
Knowledge Bases
LiReWiElectronic
Textbook
Candidate
Relationship
Extraction
Combination &
Filtering
37. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Mapping
37
Topic: Syntax
Wikipedia id=3206060
WordNet id=?
Comparer
Page Rank
Disambiguation
Syntax
WordNet id= 6176322
Syntax
WordNet id= 8436203
Final id
Mapped WordNet id
returned=
WordNet id =
6176322
! =
Fernando’s Mappings
Babelnet Mappings
Wiki Id WordNet id
3206060 8436203,…
………. ………..
……… …………
Wiki Id WordNet id
3206060 6176322,…
………. ………..
……… …………
Mapping To
WordNet
Disambiguation
Disambiguation Context
WordNet id
8436203
6176322
……….
Java, Programming….
38. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Identification of Pedagogical Relationships: LiReWi
38
Mapping
Candidate
Relationship
Extraction
Topics
Knowledge Bases
LiReWiElectronic
Textbook
Combination &
Filtering
39. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Candidate Relationship Extraction
39
WordNet
Extractor
Wibi
Extractor
WikiRelations
Extractor
Shallow Parsing
Grammar
Extractor
Sequential
Extractor
NLP data
WikiTaxonomy
Extractor
isA
partOf
prerequisite
prerequisite
pedagogically-
Close
isA
partOf
isAisA isA
partOf
Candidate Relationships
40. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Candidate Relationship Extraction
40
Path Based Extractors:
Rocky planet
Mars
Planet
(path length=2,
confidence=0.9)(path length=1,
confidence=1)
isA
isA
WordNet
Extractor
Wibi
Extractor
WikiRelations
Extractor
Shallow Parsing
Grammar
Extractor
Sequential
Extractor
WikiTaxonomy
Extractor
41. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Candidate Relationship Extraction
41
• WikiRelations: Set of tuples that state the relationships between Wikipedia
categories.
T Tauri, Star, isA
…………
Radiation, Radio waves, partOf
Light, Electromagnetic radiation, partOf
…………
Light, Electromagnetic radiation, partOf
…………
T Tauri star, Star, isA
007 license to kill, video games, isA
WikiRelations Tuples
Light partOf
Electromagnetic radiation
(Confidence=0.7)
Topic: Light
Cat1: Light
Cat2: …
Topic: Electromagnetic radiation
Cat1: Electromagnetic radiation
Topic: ……
WordNet
Extractor
Wibi
Extractor
WikiRelations
Extractor
Shallow Parsing
Grammar
Extractor
Sequential
Extractor
WikiTaxonomy
Extractor
42. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Sentences with mentions
Earth is part of the Solar System.
……………….
Candidate Relationship Extraction
42
• Extractor based on the rules defined in (Larrañaga, 2012).
Topics
Solar System
Earth
Planet
Mars
Find Mentions
Constraint Grammar
applied to POS tags
Relationships
Earth partOf Solar System
……………….
…………
Grammar
Topic + [*]+ part of + [det] +Topic
……………….
Textbook
WordNet
Extractor
Wibi
Extractor
WikiRelations
Extractor
Shallow Parsing
Grammar
Extractor
Sequential
Extractor
WikiTaxonomy
Extractor
43. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
WordNet
Extractor
Wibi
Extractor
WikiRelations
Extractor
Shallow Parsing
Grammar
Extractor
Sequential
Extractor
WikiTaxonomy
Extractor
Candidate Relationship Extraction
43
Textbook
Topics
Wavelength
Emission spectrum
Planet
Solar System
Find
Mentions
Look links
in/links out on
Wikipedia
Reasoner
Relations
Emission spectrum
pedagogicallyClose Wavelength
…………………….
Possible candidates:
Wavelength, Emission Spectrum
(2 times)
Sentences with mentions
...leading to different radiated wavelengths,
make up an emission spectrum.
... the emission spectrum of a particular
star, the wavelength of …
……………..
Relatedness > threshold
Emission spectrum (link out) Wavelength
Wavelength (link out) Emission spectrum
44. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Candidate Relationship Extraction
44
Topic1 Topic2 Topic3 Topic4
Topic1 is pedagogicallyClose to Topic2 Topic3 is a prerequisite of Topic4
4
3
4
1
Mentions (Links):
-Topic3, 4 mentions
-….
Mentions (Links):
-Topic4, 1 mentions
-….
Mentions (Links):
-Topic2, 3 mentions
-….
Mentions (Links):
-Topic1, 4 mentions
-….
WordNet
Extractor
Wibi
Extractor
WikiRelations
Extractor
Shallow Parsing
Grammar
Extractor
Sequential
Extractor
WikiTaxonomy
Extractor
45. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Identification of Pedagogical Relationships: LiReWi
45
Mapping
Candidate
Relationship
Extraction
Combination &
Filtering
Learning Domain
Ontology
Topics
Knowledge Bases
LiReWiElectronic
Textbook
46. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Combination & Filtering Relationships
46
-Earth isA Planet (WordNet Ex) (Conf=1)
-Earth isA Planet (WikiRelations Ex) (Conf=0.8)
-Planet isA Earth (WikiTax Ex) (Conf=0.7)
-Earth partOf Solar System (WordNet Ex) (Conf=1)
-Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)
-Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1)
-Earth partOf Solar System (WordNet Ex) (Conf=1)
Relationships
-Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1)
-Planet isA Earth (WikiTax Ex) (Conf=0.7)
-Earth partOf Solar System (WordNet Ex) (Conf=1)
-Earth isA Planet (WordNet Ex, WikiRelations Ex) (Conf=1)
-Earth partOf Solar System (WordNet Ex) (Conf=1)
-Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)
Confidence
Combiner
Conflict
Resolver
Filter
Final Relationships
Conflict
Resolution
Relationships combined
Filter below
threshold
-Planet isA Earth (WikiTax Ex) (Conf=0.7)
-Earth isA Terrestrial Planet (WikiTax Ex) (Conf=0.5)
47. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Evaluation
47
Tuning up
• Introduction to Object Oriented Programming textbook.
Evaluation
• Gold Standard and Expert Validation.
• Introduction to Astronomy textbook.
• Gold standard, four experts stated the set of relationships.
• Using a subset of the main domain topics according to the score given by LiTeWi.
49. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Multilingual
Learning Objects
Conclusions and
Future Work
LiDom Builder
Outline
49
I. Introduction: Motivations and Goals
II. LiDom Builder: Building Multilingual Domain
Modules
III. Acquisition of Multilingual Terminology
IV. Identification of Pedagogical Relationships
V. Gathering Multilingual Learning Objects
VI. Conclusions and Future Work
50. Gathering Multilingual
Learning Objects
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Conclusions and
Future Work
LiDom Builder
Introduction
50
In DOM-Sortze, LOs acquisition for Basque using Shallow Parsing.
A Validation of the approach for English has been carried out.
LiLoWi has been developed to move towards the elicitation of
Multilingual LOs.
51. Gathering Multilingual
Learning Objects
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Conclusions and
Future Work
LiDom Builder
Adapting Learning Object elicitation to English
51
Basque English
Pattern adibidez, @topic for instance, @topic
Example
Uretan, adibidez hidrogeno eta oxigeno
atomoak daude.
For instance, there are hydrogen
and oxygen atoms in water.
Textbook
Topics
Wavelength
Emission spectrum
Earth.
Solar System Find
Mentions
Grammar
Sentences with mentions
Earth is a planet.
……………….
Learning Objects
The Moon is Earth's
only natural satellite
55. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
• Evaluated on the Principles of Object-Oriented Programming.
• Used the same LDO described in the previous experiment.
• Expert Validation.
Two Aspects
How LiLoWi enhanced the LO coverage for the LDO topics.
How many multilingual LOs are extracted.
Evaluation
55
56. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and
Future Work
LiDom Builder
Results
56
Definitions References
English Spanish Basque French
Number of topics
Topic coverage (%)
46
56.10
36
43.90
9
10.97
36
43.90
12
14.63
• Grammar + Wikipedia/WordNet
Total Definitions
Number of topics 21 19
Topics coverage (%) 25.61 19.51
• Grammar-based approach
57. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and Future
Work
LiDom Builder
I. Introduction: Motivation and Goals
II. LiDom Builder: Building Multilingual Domain
Modules
III. Acquisition of Multilingual Terminology
IV. Identification of Pedagogical Relationships
V. Gathering Multilingual Learning Objects
VI. Conclusions and Future Work
Outline
57
58. 58
1. Provision of a suitable formalism to represent Multilingual Domain Modules.
2. Developed a method for the elicitation of multilingual terminology.
– First term extractor to our knowledge based on searching patterns for
educational content.
3. Relationship Acquisition has been improved.
– Extension of outline processor to English + Enhancement with Wikipedia.
– Development of LiReWi, a module for the elicitation of pedagogical
relationships for Educational Ontologies.
– Developed a state of the art mapper from Wikipedia to WordNet.
4. Developed a method for multilingual LO generation.
– Extension of DOM-Sortze for English.
– Development of LiLoWi, a module for the elicitation of multilingual LOs using
different knowledge bases.
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and Future
Work
LiDom Builder
Goal Achievement
59. Conclusions and Future
Work
• Automatising the inclusion of new languages.
• Multilingual Learning Object generation from similarity and machine
translation techniques.
• Concept Map-Based Learning Object Generation.
• Improvements on each module of LiDom Builder.
59
Future Work
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
LiDom Builder
60. Conclusions and Future
Work
Software Released
60
Software
• LiTeWi, released with Spanish/English support: https://github.com/Neuw84/LiTe
• Wikipedia/WordNet mapper: https://github.com/Neuw84/Wikipedia2WordNet
• Spanish stemmer: https://github.com/Neuw84/SpanishInflectorStemmer
• Training Data for Wikiminer: https://github.com/Neuw84/Wikipedia353Spanish
• LiReWi: coming soon….
Web Demo
• LiDom builder : http://galan.ehu.es/lidom/
Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
LiDom Builder
61. Introduction
Acquisition of
Multilingual
Terminology
Identification of
Pedagogical
Relationships
Gathering Learning
Objects
Conclusions and Future
Work
LiDom Builder
61
Publications
A Combined Approach for Eliciting Relationships for Educational Ontologies Using Several
Knowledge Bases.
Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga.
Journal of Knowledge-Based Systems. Submitted.
LiteWi: A Combined Term Extraction Method for Eliciting Educational Ontologies from Textbooks.
Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga, Dan Roth.
Journal of the Association for Information Science and Technology, 67(2), pp. 380–399, 2016.
Testing Language Independence in the Semiautomatic Construction of Educational Ontologies.
Ángel Conde, Mikel Larrañaga, Ana Arruarte, Jon A. Elorriaga.
12th International Conference on Intelligent Tutoring Systems ITS 2014, Springer, Vol. 8474, pp.
545-550, 2014.
Automatic Generation of the Domain Module from Electronic Textbooks. Method and Validation.
Mikel Larrañaga, Ángel Conde, Iñaki Calvo, Jon A. Elorriaga, Ana Arruarte
IEEE Transactions on Knowledge and Data Engineering, 26(1), pp. 69-82, 2014.
Automating the Authoring of Learning Material in Computer Engineering Education.
Ángel Conde, Mikel Larrañaga, Iñaki Calvo, Jon A. Elorriaga, Ana Arruarte.
42nd Frontiers in Education Conference, pp. 1376-1381, 2012.
62. LiDom Builder: Automatising the Construction of Multilingual Domain
Ángel Conde Manjón
GaLan Research Group – LSI department, University of the
Basque Country (UPV/EHU)
Supervisors:
Mikel Larrañaga Olagaray & Ana Arruarte Lasa
UPV/EHU
Editor's Notes
Good morning to everybody, I´m Angel Conde Manjón a member of the Galan research group at the University of the Basque Country
First of all I would like to thank the committee members for attending this thesis defense.
(This thesis has been developed under the supervision of Dr. Larrañaga and Dr. Arruarte and supported by the GaLan Research Group)
This thesis called LiDom Builder is about Automatising the Construction of Multilingual Domain Modules
(SOBRA I am going to present my thesis to obtain the PhD degree in Computer Sciences from the University of the Basque Country. )
Well I am going to start with some facts for putting this work in context.
-- The first one is that the Technology Supported Learning Systems are very popular and broadly used nowadays.
For example…….
-- The second fact is that Bilingual and multilingual….
-- Finally I must say that acquiring the domain module, that’s it the TSLS content, is a cost and work intensive task.
(Then Providing aid tools for building such systems, and, especially, tools for developing the learning content for those systems, is essential.
____________________________________________________________________________________________________________________
____________________________________________________________________________________________________________________
Then, providing tools for … atomatising the construction of multilingual domain modules is the main goal of this work
_____________________________________________
----------------------------------------------------------------------
Lets start with the introduction
Nowadays the Technology Supported Learning Systems are very popular and broadly used.
For example,
Domain module is the core of any tlts
Any tsls requires an appropriate representation of the knowledge to be mastered by the student, i.e., the Domain Module.
Cost to build them in terms of time and difficulty
Then Providing aid tools for building such systems, and, especially, tools for developing the learning content for those systems, is essential.
Voy a empezar contextualizando el trabajo, partimos de tres realidades:
- Tecnología cada vez más utilizada
- Contextos bilingües y multilingües
- La dificultad de construcción del módulo del dominio que define la información requerida por los sitemas para realizar su labor
In a previous work Larrañaga in 2012 proposed a framework called Dom-Sortze to build domain ....
But why use textbooks?
Because The authors of the textbooks face the same problems when writing their books. They include information about the domain topics, definition, examples, and even(iven) exercises that will allow them to mastering the contents.
Moreover, they structure the textbook in means that facilitates understanding and learning.
In Dom Sortze the Domain Module acquisition process entails three tasks:
1. First, the electronic textbook is prepared for the knowledge elicitation tasks
2. Once the internal representations of the outline and the body of the textbook have been extracted, the LDO is generated
3. Finally, After building the LDO, the LOs are gathered
In Dom-Sortze, the Domain Module is described by means of an Educational Ontology
The LDO contains the main domain topics and the pedagogical relationships among them. (esto no lo digo Pedagogical relationships can be structural ─isA and partOf─ or sequential ─prerequisite and next─ )
The set of Learning Objects (LOs) that will be used for mastering each domain topic (definitions, examples, exercises, etc.)
(approach presented throughout(thruout) this thesis)
Dom-Sortze has two limitations….
First…. It only supports Basque language.
Second, The used formalism Represents Domain Modules in one language.
_________________________________________
------------------------------------------------------------
But multilingual…. Blah blah
Therefore, we should work on an answer for this necessity
Taking into account the previous work and the main goal.
We should answer these questions to develop the specific objectives
Blah blah…
These questions lead us to the presentation´s structure.
This presentation is organised in six main sections.
We have already gone throughout(thruout) the introduction.
following, our proposal for eliciting multilingual domain modules will be described,
Next, the three main parts of the system will be depicted
And Finally, I will give some brief conclusions and future lines
Well its time to focus on LiDOm Builder ….. The system we have built for building multilingual domains modules
LiDom Builder is a framework that we have developed in order to deal with the task of automatising the construction of Multilingual Domain Modules
An overview of the system is presented below where it can be seen the three main tasks that need to be carriedo in order to built Domain Modules
Multilingual terminology
Pedagogical relationship
Multilingual Learning Object Generation
In First place, to be able to deal with multilingual domain modules the formalism presented in Dom-Sortze has been extended
That’s it, for each topic we have assigned (asaingd) an identifier, and different labels depending the language
Moreover the LOs formalism has been extended to support different languages and to define equivalents (definition on different languageS)
In this thesis the next enhancements are added to Dom Sortze,
First a language identification procedure i
For the preprocess NLP parsers should be added for each language.
Then the topic extraction and the relation extraction processes have been extended on the LDO Gathering step
Finally the LOS gathering has been enhanced by obtaining multilingual LOs
During the last few years, knowledge resources such as Wikipedia and WordNet have been used for terminology extraction, relations acquisition and…. In general for natural language processing.
That’s why they are incorporated into LiDom Builder.
However, as working with Wikipedia entails big efforts due to its size WikipediaMiner has been used in order to interact with it.
For this work the following evaluation methodology has been used
Two phases…. Tuning evaluation..
Gold Standard, where the results are compared against it
Expert validation, as the results of the system may be interesting for mastering the topics but not in the gold standard…. An expert evaluation….
3 books of different domains have been used, programming astronomy and biology
For some parts more resources have been used, but those will be explained later.
Now its time to focus on one of the main parts of LidOm, the one that takes care of the acquisition of multilingual terminology
In Dom-Sortze, only possible to extract terminology using ErauzTerm (Alegria, 2004) for textbooks written in Basque.
LiTeWi, a tool for eliciting multilingual terminology has been developed.
LiTeWi. Terminology acquisition entails two main steps: the identification of the candidates using diverse techniques, the combination and the refinement of the results to obtain the final set of terms,
ALGORITMOSDE OTROS , los tres primeros
TF-IDF (Salton, 1988): besides the term frequency, considers the relevance of the terms in the corpus.
KP-Miner (El-Beltagy, 2009): a rule based keyphrase extractor for English and Arabic.
CValue (Frantzi, 2000): takes into account the occurrence of terms candidates as a part of longer terms.
Moreover we have developed an algorithm called Shallow Parsing Grammar
The Shallow Parsing Grammar algorithm has been developed with the hypothesis, of finding terms where LOs fragments may be found .. (esto no decirlo de viva voz porque lia?, dejarlo por si preguntan)
This algorithm uses a grammar derived from larrañaga 2012 where the fragments that may contains LOs will be selected
For processing the textbook with this algorithm,
First we process the textbook with a grammar based in the previously mentioned one where we identify grammar structures that may contain DRs.
Then we process the sentences with a shallow parser in order to extract the Noun Phrases.
Once all the algorithms finished the candidate selection process starts.
First a combination step is done where all the results from the different algorithms are merged.
Following ,a Mapping procedure to Wkipedia is done
For mapping the terms to Wikipedia a search procedure to match Wikipedia article titles and their labels is done.
However……
The same term may have different senses. For example….
Then we need to disambiguate them
The first part is the Wikipedia Mapping part. …..
The terms obtained in the previous step are related to their corresponding Wikipedia articles.
Those not mapped are filtered.
This entails searching in Wikipedia to determine whether or not each selected term can be related to one or more Wikipedia articles, each one representing a possible sense/meaning of the term.
Depending the stemmer used trading of Precision/Recall
Problem:
For addressing the problem of various senses a disambiguation step has been added.
For disambiguation the terms a Method based….
A domain knowledge step is added to improve the results
We use Milne system with domain important terms as input the so called GOLD TERM LIST
How to choose them?
After some analys we realized that longer terms usually more specific. (you can see in the figure that)…..
We use CValue that assings more weigh to those term, furthermore the top results important of the domain.
A method that uses Milne and Witten Global disambiguation (Milne2008) approach is used to fulfil this task, to which end the Wikiminer Compare Service is used. This service provides a way for disambiguating term pairs using a classifier
that takes as features:
• The data provided by Wikipedia. Wikipedia provides statistics about how an article label is associated to a sense/meaning. For example, 55% of “Java” labels refer to the programming language whereas 15% of them refer to the Indonesian island. These statistics yield three features for the classifier: the average, maximum and minimum prior probabilities of the two concepts.
• The semantic relatedness between the concepts. The relatedness score can be computed using the links of the articles as features. Milne2013 claim that “Wikipedia articles reference each other extensively, and at first glance the links between them appear to be promising semantic relations. Unfortunately, the article also contains links to many irrelevant concepts (e.g. terms not related to the domain of the analyzed book). Therefore, an individual link between two Wikipedia articles cannot be trusted”. There are different possibilities for computing the relatedness measure, for instance, using the article in-links (those inside the article and refer to other articles).
Both measures use different sets of links. The normalized distance measure is based on an approach that looks for documents that mention the terms of interest, and has been adapted to use the links made to articles. The vector similarity measure is based on an approach that looks for terms mentioned within two documents of interest, and has been adapted to use the links contained within articles.
However, there is no reason why each measure should not be applied to the other link direction.
Thus, each of the measures described above yields two features, one for in-links and the other for out-links. Finally, another measure taking into account the link counts for each article could be used. Different configurations have been tested. As pointed out by Milne2013, the more features used, the higher the performance is.
Therefore, the measure that combined the links-in, links-out and link-counts was selected for computing the relatedness score.
the term size in n-grams (number of words composing the term) increases. Therefore, the more n-grams a term has, the more specific it is. Nevertheless, domain relevant terms are required. Hence, the monosemic terms with highest CValue
score are chosen for the gold term list. This
For disambiguation the terms a Method based….
A domain knowledge step is added to improve the results
We use Milne system with domain important terms as input the so called GOLD TERM LIST
How to choose them?
After some analys we realized that longer terms usually more specific. (you can see in the figure that)…..
We use CValue that assings more weigh to those term, furthermore the top results important of the domain.
Finally a majoritiy vote procedure is done to obtain the final sense involving the different ouputs from the GOLd term list……
A method that uses Milne and Witten Global disambiguation (Milne2008) approach is used to fulfil this task, to which end the Wikiminer Compare Service is used. This service provides a way for disambiguating term pairs using a classifier
that takes as features:
• The data provided by Wikipedia. Wikipedia provides statistics about how an article label is associated to a sense/meaning. For example, 55% of “Java” labels refer to the programming language whereas 15% of them refer to the Indonesian island. These statistics yield three features for the classifier: the average, maximum and minimum prior probabilities of the two concepts.
• The semantic relatedness between the concepts. The relatedness score can be computed using the links of the articles as features. Milne2013 claim that “Wikipedia articles reference each other extensively, and at first glance the links between them appear to be promising semantic relations. Unfortunately, the article also contains links to many irrelevant concepts (e.g. terms not related to the domain of the analyzed book). Therefore, an individual link between two Wikipedia articles cannot be trusted”. There are different possibilities for computing the relatedness measure, for instance, using the article in-links (those inside the article and refer to other articles).
Both measures use different sets of links. The normalized distance measure is based on an approach that looks for documents that mention the terms of interest, and has been adapted to use the links made to articles. The vector similarity measure is based on an approach that looks for terms mentioned within two documents of interest, and has been adapted to use the links contained within articles.
However, there is no reason why each measure should not be applied to the other link direction.
Thus, each of the measures described above yields two features, one for in-links and the other for out-links. Finally, another measure taking into account the link counts for each article could be used. Different configurations have been tested. As pointed out by Milne2013, the more features used, the higher the performance is.
Therefore, the measure that combined the links-in, links-out and link-counts was selected for computing the relatedness score.
the term size in n-grams (number of words composing the term) increases. Therefore, the more n-grams a term has, the more specific it is. Nevertheless, domain relevant terms are required. Hence, the monosemic terms with highest CValue
score are chosen for the gold term list. This
After having disambiguated all the terms we will try to filter those not related with the domain in a filtering step
In this step, those terms which are not related to the domain are deleted.
In this case the we use Astronomy domain r
For this task, the gold term list built in the disambiguation step is used.
This task attempts to relate each elicited term with the terms in the gold term list, to which end the Wikiminer Comparing Service has been employed.
First it discard those topics below the relatedness threshold.
Then it requires that the defined threshold to be passed for more than N GOLD TOPICS
_________________________________________________________________________
First, the Wikiminer Comparing Services computes each term domain-relatedness. Those topics whose score is below the threshold are dropped.
Finally, those terms which are related with at least the minimum amount of gold terms are selected.
the candidate term to be related with at least one of the gold term list
Therefore, this is the set-up that achieves the best compromise between recall and precision.
The final step of Litewi entails mapping the terms to other languages, using Wikipedia information we obtain those links directly whenever they are available
LiTeWi has been evaluated using Gold Standard and Expert validation,
The Gold standard has been based on the terms appearing on the textbooks,
Litewi has been tuned up with….
And evaluated on……
________________________________________________________________________________________________________________________
The first book used for the evaluation is the Introduction to Astronomy (Morison, 2008) textbook. This book consists of 150 pages of plain text and over 110,000 words.
The index is composed of 378 unique terms of which 114 are single word terms (1- grams), 189 terms are 2-grams, 57 terms are 3-grams, and 18 terms are 4-grams.
322 (out of 378) of the index terms were related to one or more Wikipedia articles.That is to say, 85.18% of the terms refer to at least one Wikipedia article, such a
proportion being the best recall achievable.
The second book used for the evaluation is the Introduction to Molecular Biology (Raineri, 2010). This book consists of 139 pages of plain text with over 70,000 words.
The index is composed of 274 unique terms of which 116 are single word terms, 119 of them 2-grams, 35 3-grams, 3 4-grams, and 1 5-gram. For this textbook, 220 out
of 274 of the index terms were related to one or more Wikipedia articles. Hence, the best achievable recall is 81.30%
In this table we can see the general results LiteWi, we have also tested the results of each step but those are out of scope for this presentation.
General better results
We can see that we have quite good results in the different domains this can be related to use different algorithms and to use Wikipedia being it multidomain.
The difference is specially remarkable in Recall----
Well after finishing with the acquisition of Multilingual terminology our focus is with another part of LiDom…. The part that takes care of pedagogical relationships
In Dom Sorze the proceess of acquiring relationships is divided in two parts. One the one hand, there is an heuristic process for getting relations from the outline.
On another hand, for processing the whole textbook , one algorithm is used to extract relations from the whole textbook only for basque
An extension has been developed for the outlines where the process is generalized for English and then improved
For processing the texttbook a new tool called LiReWi has been developed
he identification of pedagogical relationships has been addressed:
DOM-Sortze approach for getting outlines has been extended.
Generalization for English + evaluation.
Improve its knowledge acquisition results using Wikipedia.
For the whole textbook analysis, a new tool has been designed that improves the acquired knowledge.
First we will focus on the outline process….
Why?
Dom-Sortze uses and heuristic process for procesing otuline
Each outline item is considered as a domain topic Where by default partOf relations are identifiend, then isA relations are refined using heuristics.
(se detecto falta the domain knowledge) Detected faulty isA identification … lack of domain knowledge for example detecting diseases
In LIDOM
First an extension …..
Wikitaxonomy,Ponzetto (2007), derived a large scale taxonomy containing isA relationships from Wikipedia. Ir order to deal with the lack of knowledge
Each index item is considered as a domain topic.
________________________________________________________________
The structure of the document outline is used as a means to gather pedagogical relationships.
A subitem of a general topic is used to explain part of it or a particular case of it.
Different heuristics can be fired together in the same group of subitems so, the most confident one is returned.
The default heuristic (partOf), is returned when no other heuristic condition is met.
Some of those heuristics rely on Natural Language Processing (NLP) services, for instance, those to identify entity names.
The outline analysis process consists of two phases:
In the basic analysis, the main topics of the domain and the relationships between these topics are mined from the outline.
In the heuristic analysis the results of the basic analysis are refined based on a set of heuristics that categorize the relationships .
The heuristics entail the condition to be matched, and the post-condition, i.e., the relationships that are recognized.
Group heuristics identify relationships from homogeneous subitems or if the outline item entails certain keywords.
Individual heuristics are tested on every subitem in the case no Group heuristic is fired.
Is this algorithm we want to refine false partOf using the domain knoweldge contained wikipedia.
First we identify siblings nodes…
Then we categorize them with ponzetos taxonomy.
process every group using Ponzetto and Strube’s taxonomy [15] to look for common ancestor
infer isA relationships in those groups that share a common ancestor, as long as it does not appear at top-levels in the taxonomy
Queremos Refinar falsos part of en isa con specifidad del dominio, buscar siblings, categorizarlo con el ponzetto, y aquellos que ay un grupo los mapeas con el padre para formar un isa
Identify groups of sibling nodes (topics) of the LDO extracted from the outline;
select the groups of leave nodes in which the partOf relationship has been identified to apply the subsequent steps;
link every node to those Wikipedia articles which are labeled with the normalized text of the node;
run a disambiguation process based on Wikiminer to map each node to a unique article; ?¿ mirar
process every group using Ponzetto and Strube’s taxonomy [15] to look for common ancestor
infer isA relationships in those groups that share a common ancestor, as long as it does not appear at top-levels in the taxonomy.
the nodes (removing plural marks, apostrophes and avoiding case differences);
57 outlines from textbooks of different courses and domains have been processed. Human (hiuman)
Gold-standard approach, manually defined LDOs that were used as optimal output.LDOs were restricted to the topics referred on the outlines and the structural relationships between those topics.
-“isA” relationship: “Earth is a planet”.
-“partOf” relationship: “Earth belongs to the Solar System”
A total of 1197 partOf, 483 isA relations evaluated.
In the next point we are going to depicts the results,
The lack of knowledge on certain domains significantly affected the performance. For instance, it was observed that many of the topics involved in the missing isA relationships contained proper names; however, the entity name recognizer used in the experiment was
unable to identify them. Using the Wikipedia enhanced process.
That the results are quite similar, nevertheless, In regards to isA relationships, the recall has dramatically increased from 21.20% to 50.53% whereas the precision was hardly affected (77.30% vs. 78.95%).
Let’s move o
The overall performance has improved (87.70% precision and recall). Regarding partOf relationships, the recall has slightly decreased (96.49% vs. 98.66%) but the precision has slightly increased from 84.12% to 89.12%.
In regards to isA relationships, the recall has dramatically increased from 21.20% to 50.53% whereas the precision was hardly affected (77.30% vs. 78.95%).
Well, next I am going to talk about the tool that I have designed to deal with relationship identification processing the whole textbook.
Regarding the elicitation of relationships from the document body……
LiReWi: a Relationship Extractor for Educational Ontologies from whole documents.
In order to map the terms or topics from Wikipedia to WordNet the following process is carried out.
- First we have taken two works that have already addressed this task, then we have compared the obtained results if their ouputs agree the system retruns that identifier.
- Whether those results are different a page rank disambiguation step is done. For that we employ UKB by Aguirre and Soroa. As we want to disambiguate a disambiguation context is needed. The followed procedure is similar than the one used in LiTeWi but in this case we will require that the topics to have only one sense in WordNet.
After the procedure is done the WordNet identifier is returned.
thmapper looks first for the appropriate equivalent synset in those mappings identified in BabelNet Project Navigli2012, and also in those mappings discovered by Fernando2013. If the same synset is found in both cases, the mapper assumes that there are no ambiguity problems and returns the identified synset. Otherwise, a disambiguation process is carried out to identify which of the candidate synsets is the appropriate one. To this end, a Page Rank Mapping Disambiguation step is carried out using UKB (Aguirre:EACL:2009), a tool for Word Sense Disambiguation and for determining lexical similarity using a pre-existing knowledge base such as Wikipedia or WordNet. UKB requires a context to fulfil its goal. The context is obtained from the topics extracted by LiTeWi along with the domain relatedness
LiTeWi assigned to each of them. The topics with highest domain relatedness score and with a unique meaning in WordNet constitute the context that allows choosing the synset for the topic. In the example of Figure 5.4, the mapped synsets returned by Navigli2012 and Fernando2013 mappings are different.
Therefore, the Page Rank Mapping Disambiguation step is carried out to determine the final synset of syntax in WordNet. The context used in the example entails topics such as Programming, Menu bar and Java. The Page Rank Mapping Disambiguation mechanism could select a different synset from those proposed by Navigli2012 and Fernando2013.
LiReWi: a Relationship Extractor for Educational Ontologies from whole documents.
To elicit the pedagogical relationships between the domain topics, LiReWi follows the procedure shown in Figure 5.3. First, all the topics are mapped to the diverse knowledge bases (e.g. Wikipedia, WordNet and others derived from both) that will be used to identify the relationships. Then, several relationship extractors, each using a different approach, are concurrently run to elicit candidate relationships. Finally, the results are combined and filtered to obtain the final set of pedagogical relationships. In the next subsections, each step is described in more detail.
Again, LiReWi has been firstly tested on the Principles of Object-Oriented Programming (Wong2010) in order to determine its optimal set-up and, then, evaluated on the Introduction to Astronomy (Morison2008) textbook.
To extract pedagogical relationships between topics, LiReWi uses, in addition to shallow parsing techniques, several knowledge bases such as Wikipedia, WordNet, WikiTaxonomy, WibiTaxonomy and WikiRelations. To this end, it is necessary to
map every topic to its corresponding entries in those knowledge bases. The topics identified by LiTeWi are already mapped and disambiguated to Wikipedia articles; WikiTaxonomy, WikiRelations and WibiTaxonomy are based on Wikipedia articles.
However, to be able to use WordNet, the topics must still be mapped to WordNet entries. WordNet organizes words (nouns, verbs, adjectives and adverbs) into cognitive synonyms called synsets. Each synset refers to a distinct concept that can be referred to using different forms. Navigli2012 and Fernando2013 faced a similar problem and defined the mappings or equivalences between Wikipedia articles and Wordnet synsets.
Candidate Relationship Extraction /// Path based extractors!!!
WordNet (Fellbaum1998) can be considered as a huge graph of topics connected by semantic relationships.
WibiTaxonomy Extractor: WibiTaxonomy (Flati, 2014) is a knowledge base that comprises two interconnected taxonomies.
WikiTaxonomy Extractor … coger los bloques de la anteiror
Taxonomia del recurso, mapeo a la taxoonimia
Busco un camino ,,,,
Hyperoniima isa
Meronomiia
Paths of limited length to infer the relationships. Confidence depending the path length.
The Wikipedia article taxonomy and the category taxonomy.
Extracting relationships from WibiTaxonomy entails two steps.
First, each topic is mapped to the articles/category taxonomy using the mapped Wikipedia article of each topic.
Paths of a limited length to infer the relationships from both articles/categories taxonomies
The WikiTaxonomy (Ponzetto & Strube, 2007) is a huge taxonomy derived from the Wikipedia category system where all the links between categories are represented by isA relationships.
Moreover, WikiTaxonomy contains a dictionary where the articles are mapped to the corresponding category entries in the taxonomy.
First, each topic is mapped to its corresponding WikiTaxonomy categories
Then, a DFS search is carried out to find the shortest upwards path between the topics considering the categories in the WikiTaxonomy
Search limited in length.
WikiRelations (Nastase, 2008) knowledge base comprises a big set of tuples between Wikipedia categories containing several kinds of relationships.
In this work, only the subset of tuples containing isA or partOf relationships has been employed.
Map directly each topic to its corresponding topic.
Confidence based on the number of tuples containing that relation.
Map topics to their correspondent categories.
In this slide the Shallow Parsing Grammar extractor is depicted.
This extractor is based on Larrañagas Work for basque….
The following procedure is done by the algorithm…….
First…..
This extractor aims to elicit sequential relationships such as prerequisite and pedagocallyClose. The Sequential Extractor uses the information contained in the processed textbook along with information gathered from Wikipedia to extract these kinds of
relationships. In particular, it uses the co-occurrences of the topics within the sentences along with the Wikipedia link structure between articles. To use the information of the link structure between articles, this module uses WikiMiner (Milne and
Witten, 2013). Next, the procedure is described (see Figure 5.15).
First, as occurs in the Shallow Parsing Grammar Extractor, the extractor identifies the topics that are being referred in the text. Once again, the system applies a simple matching algorithm where the compound terms have prevalence over the
simple ones. The output of this process is a list of sentences that contain mentions of the input topics. Next, for each of those sentences, a reference relationship is defined between each pair of topics appearing in the sentence if the first topic refers to the
second. A topic is considered to refer to another if a link out from the first topic to the second exists in Wikipedia with a relatedness score beyond an empirically
gathered threshold. LiReWi uses WikiMiner to compute the relatedness score of two topics. For example (see Figure 5.16) topic1 and topic2, which have links in both directions in Wikipedia, appear in the same sentence. As their relatedness is higher
than the empirically determined threshold (0.7) a link between them is annotated. Topic3 only references Topic2, but their relatedness is below the threshold. Finally, for each linked topic pair, a sequential relationship is inferred. If the
links between both topics are balanced, i.e., the number of links from the first topic to the second is similar to the number of links from the second to the first, a peda-
gogicallyClose relationship between both topics is inferred. Otherwise, a prerequisite relationship is inferred from the topic with the highest number of outgoing links to
the topic with higher incoming links. Figure 5.17 shows two examples in which a pedagogicallyClose and a prerequisite relationships are inferred using this procedure.
The confidence of the extracted relationships is calculated using the Formula 5.4, where b is the base confidence (0.6), top1m is the number of links from the first
topic, top2m is the number of links from the second topic and low is the threshold determining the minimum number of links for a relationship to be inferred, 2 in this case
Here we can see on the left par that a pedagogical relatonship is inferred where the mentions are balanced.
On the other hand, whether the mentions are not balanced a prerequisite relationship is inferred…..
--------------------------------------------------------------------------------------------------------------
This extractor aims to elicit sequential relationships such as prerequisite and pedagocallyClose. The Sequential Extractor uses the information contained in the processed textbook along with information gathered from Wikipedia to extract these kinds of
relationships. In particular, it uses the co-occurrences of the topics within the sentences along with the Wikipedia link structure between articles. To use the information of the link structure between articles, this module uses WikiMiner (Milne and
Witten, 2013). Next, the procedure is described (see Figure 5.15).
First, as occurs in the Shallow Parsing Grammar Extractor, the extractor identifies the topics that are being referred in the text. Once again, the system applies a simple matching algorithm where the compound terms have prevalence over the
simple ones. The output of this process is a list of sentences that contain mentions of the input topics. Next, for each of those sentences, a reference relationship is defined between each pair of topics appearing in the sentence if the first topic refers to the
second. A topic is considered to refer to another if a link out from the first topic to the second exists in Wikipedia with a relatedness score beyond an empirically
gathered threshold. LiReWi uses WikiMiner to compute the relatedness score of two topics. For example (see Figure 5.16) topic1 and topic2, which have links in both directions in Wikipedia, appear in the same sentence. As their relatedness is higher
than the empirically determined threshold (0.7) a link between them is annotated. Topic3 only references Topic2, but their relatedness is below the threshold. Finally, for each linked topic pair, a sequential relationship is inferred. If the
links between both topics are balanced, i.e., the number of links from the first topic to the second is similar to the number of links from the second to the first, a peda-
gogicallyClose relationship between both topics is inferred. Otherwise, a prerequisite relationship is inferred from the topic with the highest number of outgoing links to
the topic with higher incoming links. Figure 5.17 shows two examples in which a pedagogicallyClose and a prerequisite relationships are inferred using this procedure.
The confidence of the extracted relationships is calculated using the Formula 5.4, where b is the base confidence (0.6), top1m is the number of links from the first
topic, top2m is the number of links from the second topic and low is the threshold determining the minimum number of links for a relationship to be inferred, 2 in this case
LiReWi: a Relationship Extractor for Educational Ontologies from whole documents.
Mas confidencia ++ extractores se saca que…… conflict fuero
has been evaluated using two approaches, Gold-standard and expert validation.
This time, LiReWi has firstly been tuned up on the Principles of Object-OrientedProgramming (Wong2010) in order to determine its optimal set-up, and subsequently,
evaluated on the Introduction to Astronomy (Morison2008) textbook.
Para eso los 199 topicos de LITEWI….
First, an evaluation of the mapping techniques is depicted. Then, the evaluation of the candidate relationship extraction is presented and, finally, the evaluation of the combination and filtering is described
Gold standard For the Gold standard evaluation, four expert stated the set of relationships .
1. This presentation is organised in ve main sections.
2. First, the context and motivation for this work will be presented in
the Introduction.
3. Next, our proposal will be described, focusing on both the process
carried out and the framework that has been developed for that
purpose.
4. Following, the evaluation conducted to validate our proposal will be
shown.
5. Finally, the conclusions and future lines identied will be depicted
In Dom Sorze the proceess of acquiring relationships is divided in two parts. One the one hand, for processing the outline.
On another hand, for proccesing the whole textbool , one algorithm is used to extract relations from the whole textbook only for basque
An extension has been developed for the outlines where the process is generalized for English.
For processing the texttbook a new tool called LiReWi has been developed
he identification of pedagogical relationships has been addressed:
DOM-Sortze approach for getting outlines has been extended.
Generalization for English + evaluation.
Improve its knowledge acquisition results using Wikipedia.
For the whole textbook analysis, a new tool has been designed that improves the acquired knowledge.
DR Grammar was evaluated by analyzing the atomic LOs.
(decir de voz y extraido los LOs=)
For the DR Grammar Similar results conducted experiments over textbooks in Basque (Larrañaga 2012a).
Lower accuracy for problem statements (imperative cases difficult to detect in English).
For the Learning Objects we
Evaluating the quality of the content in Wikipedia and WordNet is beyond the scope of this experiment.
Therefore, the experiment here conducted consisted of measuring how much the use of LiLoWi enhanced the LO coverage for the LDO topics, and how many multilingual Los were elicited.
Is assumed that all the definitions and LOs extracted from Wikipedia and Wordnet are correct
Therefore, the experiment here conducted consisted of measuring how much the use of LiLoWi enhanced the LO coverage for the LDO topics, and how many multilingual Los were elicited.
Here we can use the results of the evaluation for LiLoWi against Dom Dortze in the previously described evaluation.
LiLoWi outperforms Dom-Sortze in term coverage for Los by a margin of the 25% against Dom Sortze. Moreover we gain from references.
For the final part of the presentation, some conclusions and future work will be presented
Provision
In particular, a OWL representation of multilingual learning ontologies...
LITEWI is the module responsible for the elicitation of multilingual terms for Educational Ontologies from electronic documents. It combines different approaches such as TF-IDF, KP-Miner, CValue and Shallow Parsing Grammar for the unsupervised term extraction using Wikipedia as a knowledge
base. The approach carried out by LiTeWi entails three main steps: the identification of the topic candidates; the combination and the refinement of the results to obtain the set of terms; and, finally, the mapping of the terms to other languages in Wikipedia.
LiReWi (CondeLiReWi) is the module that implements a method for the elicitation of pedagogical relationships for Educational Ontologies from electronic document bodies. It combines shadow parsing techniques in addition to several knowledge bases such as Wikipedia, WordNet, WikiTaxonomy, WibiTaxonomy and WikiRelations to elecit isA, partOf, prerequisite and pedagogicallyClose relationships. LiReWi also performs a three-step procedure to fulfil its task: first, all the topics are mapped to the diverse knowledge bases that will be used to identify the relationships; then, several relationship extractors, each using a different approach, are concurrently run to elicit candidate relationships; and, finally, the results are combined and filtered to obtain the final set of pedagogical relationships.
In LiDom Builder the process of eliciting structural relationships (isA, partOf ) from document outlines has also been enhanced, with the inclusion of Wikipedia as an additional resource (Conde2014).
LiLoWi is the module that enables the elicitation of new LOs, including some multilingual LOs, from both the original textbook body and different knowledge bases such as Wikipedia or WordNet. Once each topic of the LDO is mapped 7.2 Future Research Lines 103 to Wikipedia and WordNet, LiLoWi retrieves the information from those two resources using their corresponding LO Extractors. Before incorporating Wikipedia and WordNet to the LO acquisition process, the validity of the proposal presented in LiDom Builder to incorporate the English language has also been considered and tested (Conde2012).
Although the modular design of LiDom Builder facilitates the inclusion of a new language, some resources must be defined, in particular the heuristics and the grammars that allow the knowledge elicitation and the Discourse Markers for that language. Automatising the development of such kinds of resources will remarkably reduce the workload in the integration of a new language. In the last few years, great advances have been made in Machine Translation. The research in that field might help to semi-automatically develop the grammars and heuristics for a new language from those already defined for a particular language.
Furthermore, similar structures or equivalent patterns have been observed in the supported languages. Therefore, a meta model describing the generic patterns could be defined and rule-based transformations applied to obtain the specific grammars and heuristics for a particular language.
LiDom Builder could try to identify LOs that are equivalents or translations in other languages. To this end, different means will be explored. For example, • Latent Semantic Analysis (LSA) would be used to generate a model of each LO, and this model would be translated using Machine Translation techniques to obtain its equivalents in other languages. If a similar model were found for the translated model, then the equivalence between their corresponding LOs would be inferred. • Additionally, another Machine Translation based approach might be also explored.
To determine if two LOs, say LO1 in English and LO2 in French, are equivalent, LiDom Builder could take advantage of Machine Translation techniques by generating their automatic translations before comparing them. If the translated LO1 (LOt1) were similar to LO2, or the translation of LO2 (LOt2)
were similar to LO1, they could be considered equivalent. Diverse similarity and text reuse metrics would be tested in this approach.
For Example adding new algorithms and techniques from the NLP domain, or for example using more advanced techniques for filtering unwanted acquired knowledge.
Furthermore, similar structures or equivalent patterns have been observed in the supported languages. Therefore, a meta model describing the generic patterns could be defined and rule-based transformations applied to obtain the specific grammars and heuristics for a particular language.
Furthermore, given the multilingual nature and the layout of Wikipedia, LiDom Builder is able to generate multilingual definitions from Wikipedia. Using ErauzOnt on Wikipedia, or other additional resources, would allow the identification of additional monolingual LOs. To generate multilingual LOs from these resources, two different approaches could be applied. LiDom Builder could try to identify LOs that are equivalents or translations in other languages. To this end, different means will be explored. For example,
It can be observed that its representation is not far from the representation used in LiDom Builder to visualize the LDO, i.e, the Learning Domain Ontology. The elicitation of new types of relationships in LiDom Builder, relationships different from the currently identified pedagogical relationships, would allow the automatic generation of concept maps related to the domain considered in the textbook that LiDom Builder used as a source. The concept maps, along with their localised views, would constitute a new kind of multilingual LOs.
Se pueden ampliar,….. Litewi por ejm spanish y ademas mas algoritmos ....
These are main thesis publications related to this thesis
=======================================================
There is another one under revision and finally I would like to publish another one resuming all the process.
We have come to the end of the presentation. I’d just like to thank(thenk) you for listening and would be pleased to take your comments and questions now.
For Example adding new algorithms and techniques from the NLP domain, or for example using more advanced techniques for filtering unwanted acquired knowledge.
Furthermore, similar structures or equivalent patterns have been observed in the supported languages. Therefore, a meta model describing the generic patterns could be defined and rule-based transformations applied to obtain the specific grammars and heuristics for a particular language.
Furthermore, given the multilingual nature and the layout of Wikipedia, LiDom Builder is able to generate multilingual definitions from Wikipedia. Using ErauzOnt on Wikipedia, or other additional resources, would allow the identification of additional monolingual LOs. To generate multilingual LOs from these resources, two different approaches could be applied. LiDom Builder could try to identify LOs that are equivalents or translations in other languages. To this end, different means will be explored. For example,
It can be observed that its representation is not far from the representation used in LiDom Builder to visualize the LDO, i.e, the Learning Domain Ontology. The elicitation of new types of relationships in LiDom Builder, relationships different from the currently identified pedagogical relationships, would allow the automatic generation of concept maps related to the domain considered in the textbook that LiDom Builder used as a source. The concept maps, along with their localised views, would constitute a new kind of multilingual LOs.
La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
For enhancing the identification of the pedagogical relationshis
La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
For enhancing the identification of the pedagogical relationshis
La exposición se divide en cuatro apartados principales. En primer lugar una introducción para mostrar la motivación y objetivo del trabajo. A continuación, nuestra propuesta para cubrir el objetivo marcado, la arquitectura abstracta SIgBLE. En el tercer apartado mostraré la aplicación de dicha arquitectura abstracta en un entorno real y la evaluación realizada. Finalmente expondré las conclusiones del trabajo realizado.
Conflict is found between Earth isA Planet and Planet isA Earth proposals.
The system looks at the link structure of the topic in Wikipedia, along with the confidence of the extracted relationships, to determine the final relationship. In the figure, Earth isA Planet has higher confidence than Planet isA Earth.
In addition, Earth has a link to planet in Wikipedia, whereas “Planet” does not have a link to “Earth”..
In the next point we are going to depicts the results,
The lack of knowledge on certain domains significantly affected the performance. For instance, it
was observed that many of the topics involved in the missing isA relationships con-
tained proper names; however, the entity name recognizer used in the experiment was
unable to identify them. A training process would be necessary to fulfill such purpose.
The overall performance has improved (87.70% precision and recall). Regarding partOf relationships, the recall has slightly decreased (96.49% vs. 98.66%) but the precision has slightly increased from 84.12% to 89.12%.
In regards to isA relationships, the recall has dramatically increased from 21.20% to 50.53% whereas the precision was hardly affected (77.30% vs. 78.95%).
Table 5.11 shows the results of the evaluation of the mapping step in the . BabelNet approach led to the highest precision 100%, but its recall
was the lowest with only 14.73%. Fernando’s method, on the other hand, led to
83.33% precision with 18.42% recall. Our approach, which combines both methods
with UKB, results in 97.82% precision and 23.68% recall, showing that it greatly
increases the recall while minimizing the loss on precision. The F1-score is also
shown in the table.
Our approach combining both Babelnet & Fernando’s approaches with UKB (Aguirre, 2009).