The document outlines a doctoral dissertation defense presentation on using semantic relatedness to evaluate course equivalencies. The presentation includes an introduction, outlines knowledge sources and related work, describes two approaches to measuring semantic relatedness between courses, and discusses experimental results comparing the approaches.
Neuro-cognitive and psychological linguistics present important area of multidisciplinary research.
In this paper we have described some possible applications of mathematical methods to neuro-cognitive linguistics. In neuro-cognitive study of language, neural architecture and neuropsychological mechanism of verbal cognition are basis of a vector–based modeling. A comparison of human mental space to a vector space is an effective way of analyzing of human semantic vocabulary, mental representations and rules of clustering and mapping in typologically different languages.
Euclidean and non-Euclidean spaces can be applied for a description of human semantic vocabulary and high order structures reflecting internal and external features of object and action (event). Vector analysis of word meaning and basic syntax structures offers new methodological opportunities to interpret effect of semantic and pragmatic forces at morphology and syntax levels.
Non-linear and metaphoric transformations present specific complex phenomenon to be described in 3D and other N-dimensional spaces in the framework of quantum semantics.
Keywords: Mental mapping, human mental lexicon, embodied and symbolic cognition, verbal cognition, semantic space, scalar, vector space, mental transformation, semantic gravity.
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
Word Sense Disambiguation (WSD) is an important area which has an impact on improving the performance of applications of computational linguistics such as machine translation, information
retrieval, text summarization, question answering systems, etc. We have presented a brief history of WSD,
discussed the Supervised, Unsupervised, and Knowledge-based approaches for WSD. Though many WSD
algorithms exist, we have considered optimal and portable WSD algorithms as most appropriate since they
can be embedded easily in applications of computational linguistics. This paper will also provide an idea of
some of the WSD algorithms and their performances, which compares and assess the need of the word
sense disambiguation.
Neuro-cognitive and psychological linguistics present important area of multidisciplinary research.
In this paper we have described some possible applications of mathematical methods to neuro-cognitive linguistics. In neuro-cognitive study of language, neural architecture and neuropsychological mechanism of verbal cognition are basis of a vector–based modeling. A comparison of human mental space to a vector space is an effective way of analyzing of human semantic vocabulary, mental representations and rules of clustering and mapping in typologically different languages.
Euclidean and non-Euclidean spaces can be applied for a description of human semantic vocabulary and high order structures reflecting internal and external features of object and action (event). Vector analysis of word meaning and basic syntax structures offers new methodological opportunities to interpret effect of semantic and pragmatic forces at morphology and syntax levels.
Non-linear and metaphoric transformations present specific complex phenomenon to be described in 3D and other N-dimensional spaces in the framework of quantum semantics.
Keywords: Mental mapping, human mental lexicon, embodied and symbolic cognition, verbal cognition, semantic space, scalar, vector space, mental transformation, semantic gravity.
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
Word Sense Disambiguation (WSD) is an important area which has an impact on improving the performance of applications of computational linguistics such as machine translation, information
retrieval, text summarization, question answering systems, etc. We have presented a brief history of WSD,
discussed the Supervised, Unsupervised, and Knowledge-based approaches for WSD. Though many WSD
algorithms exist, we have considered optimal and portable WSD algorithms as most appropriate since they
can be embedded easily in applications of computational linguistics. This paper will also provide an idea of
some of the WSD algorithms and their performances, which compares and assess the need of the word
sense disambiguation.
Latent Topic-semantic Indexing based Automatic Text SummarizationElaheh Barati
Automatic summarization, a difficult but pressing problem in natural language processing, aims at shortening source documents while retaining main information. In recent years, more statistical machine learning methods have been applied to automatic summarization. In this paper, we propose a novel approach for summarization, based on hierarchical Bayesian model of topic-semantic indexing (TSI) and extraction strategy of average log-likelihood. The new method is tested on Brown corpus, and its performance is analyzed by a well-designed blind experiment of one-way ANOVA on human reviews. The experimental results show that TSI model is promising on topic- driven summarization.
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...ijnlc
Word Sense Disambiguation is a classification of me
aning of word in a precise context which is a trick
y
task to perform in Natural Language Processing whic
h is used in application like machine translation,
information extraction and retrieval, automatic or
closed domain question answering system for the rea
son
that of its semantics perceptive. Researchers tried
for unsupervised and knowledge based learning
approaches however such approaches have not proved
more helpful. Various supervised learning
algorithms have been made, but in vain as the attem
pt of creating the training corpus which is a tagge
d
sense marked corpora is tricky. This paper presents
a hybrid approach for resolving ambiguity in a
sentence which is based on integrating lexical know
ledge and world knowledge. English Wordnet
developed at Princeton University, SemCor corpus an
d the JAWS library (Java API for WordNet
searching) has been used for this purpose.
International Journal of Engineering and Science Invention (IJESI) inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
Verbal cognition: vector space analysis by Chuluundorj.B /University of the H...Buyankhishig Sunduijav
Neuro-cognitive and psychological linguistics present important area of multidisciplinary research.
In this paper we have described some possible applications of mathematical methods to neuro-cognitive linguistics. In neuro-cognitive study of language, neural architecture and neuropsychological mechanism of verbal cognition are basis of a vector–based modeling. A comparison of human mental space to a vector space is an effective way of analyzing of human semantic vocabulary, mental representations and rules of clustering and mapping in typologically different languages.
Euclidean and non-Euclidean spaces can be applied for a description of human semantic vocabulary and high order structures reflecting internal and external features of object and action (event). Vector analysis of word meaning and basic syntax structures offers new methodological opportunities to interpret effect of semantic and pragmatic forces at morphology and syntax levels.
Non-linear and metaphoric transformations present specific complex phenomenon to be described in 3D and other N-dimensional spaces in the framework of quantum semantics.
Keywords: Mental mapping, human mental lexicon, embodied and symbolic cognition, verbal cognition, semantic space, scalar, vector space, mental transformation, semantic gravity.
Latent Topic-semantic Indexing based Automatic Text SummarizationElaheh Barati
Automatic summarization, a difficult but pressing problem in natural language processing, aims at shortening source documents while retaining main information. In recent years, more statistical machine learning methods have been applied to automatic summarization. In this paper, we propose a novel approach for summarization, based on hierarchical Bayesian model of topic-semantic indexing (TSI) and extraction strategy of average log-likelihood. The new method is tested on Brown corpus, and its performance is analyzed by a well-designed blind experiment of one-way ANOVA on human reviews. The experimental results show that TSI model is promising on topic- driven summarization.
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...ijnlc
Word Sense Disambiguation is a classification of me
aning of word in a precise context which is a trick
y
task to perform in Natural Language Processing whic
h is used in application like machine translation,
information extraction and retrieval, automatic or
closed domain question answering system for the rea
son
that of its semantics perceptive. Researchers tried
for unsupervised and knowledge based learning
approaches however such approaches have not proved
more helpful. Various supervised learning
algorithms have been made, but in vain as the attem
pt of creating the training corpus which is a tagge
d
sense marked corpora is tricky. This paper presents
a hybrid approach for resolving ambiguity in a
sentence which is based on integrating lexical know
ledge and world knowledge. English Wordnet
developed at Princeton University, SemCor corpus an
d the JAWS library (Java API for WordNet
searching) has been used for this purpose.
International Journal of Engineering and Science Invention (IJESI) inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
Verbal cognition: vector space analysis by Chuluundorj.B /University of the H...Buyankhishig Sunduijav
Neuro-cognitive and psychological linguistics present important area of multidisciplinary research.
In this paper we have described some possible applications of mathematical methods to neuro-cognitive linguistics. In neuro-cognitive study of language, neural architecture and neuropsychological mechanism of verbal cognition are basis of a vector–based modeling. A comparison of human mental space to a vector space is an effective way of analyzing of human semantic vocabulary, mental representations and rules of clustering and mapping in typologically different languages.
Euclidean and non-Euclidean spaces can be applied for a description of human semantic vocabulary and high order structures reflecting internal and external features of object and action (event). Vector analysis of word meaning and basic syntax structures offers new methodological opportunities to interpret effect of semantic and pragmatic forces at morphology and syntax levels.
Non-linear and metaphoric transformations present specific complex phenomenon to be described in 3D and other N-dimensional spaces in the framework of quantum semantics.
Keywords: Mental mapping, human mental lexicon, embodied and symbolic cognition, verbal cognition, semantic space, scalar, vector space, mental transformation, semantic gravity.
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
Experimental work done regarding the use of Topic Modeling for the implementation and the improvement of some common tasks of Information Retrieval and Word Sense Disambiguation.
First of all it describes the scenario, the pre-processing pipeline realized and the framework used. After we we face a discussion related to the investigation of some different hyperparameters configurations for the LDA algorithm.
This work continues dealing with the retrieval of relevant documents mainly through two different approaches: inferring the topics distribution of the held out document (or query) and comparing it to retrieve similar collection’s documents or through an approach driven by probabilistic querying. The last part of this work is devoted to the investigation of the word sense disambiguation task.
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionLuca Nannini
We all have mental representations of events: physical or cognitive processes - yet how we create, retain and negotiate them as frames is a matter of how well we do perform causal reasoning of the dynamics involved. But how we could keep track of them when it comes to interaction? how people do argue about causal explanations where a multitude of actors and variables are interplaying? If just quantifying these features is a compelling yet jeopardizing investigation - keeping track of how they do evolve in time and we could retrieve and forecast information diffusion and contagion is where complex networks analysis meets cognitive semiotics, behavioral studies, pragmatic theories, and NLP methods.
In this talk we will summarise some of the detectable trends on AI beyond deep learning. We will focus on the current transition from deep learning to deep semantics, describing the enabling infrastructures, challenges and opportunities in the construction of the next generation AI systems. The talk will focus on Natural Language Processing (NLP) as an AI sub-domain and will link to the research at the AI Systems Lab at the University of Manchester.
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Jonathon Hare
Tutorial at the "Reality of the Semantic Gap in Image Retrieval" tutorial at the first international conference on Semantics And digital Media Technology (SAMT 2006). 6th December 2006.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Semantic Relatedness for Evaluation of Course Equivalencies
1. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Semantic Relatedness for Evaluation of Course
Equivalencies
Doctoral Dissertation Defense
Beibei Yang
Department of Computer Science
University of Massachusetts Lowell
July 23, 2012
2. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Outline
1 Introduction
2 Knowledge Sources
3 Related Work
4 First Approach
5 Second Approach
6 Summary
3. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
NLP and Education
Many NLP techniques have been adapted to the education field for:
automated scoring and evaluation
intelligent tutoring
learner cognition
However, few techniques address the identification of transfer
course equivalencies.
4. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Why is it important to suggest transfer course
equivalencies?
National Association for College Admission Counseling, 2010
“. . . less attention is focused on the transfer admission process,
which affects approximately one-third of students beginning at
either a four- or two-year institution during the course of their
postsecondary careers.”
National Center for Education Statistics, 2005
“For students who attained their bachelor’s degrees in 1999–2000,
59.7 percent attended more than one institution during their
undergraduate careers and 32.1 percent transferred at least once.”
5. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
UML’s course transfer dictionary
6. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Course descriptions
C1 : Analysis of Algorithms
Discusses basic methods for designing and analyzing efficient algorithms emphasizing
methods used in practice. Topics include sorting, searching, dynamic programming,
greedy algorithms, advanced data structures, graph algorithms (shortest path,
spanning trees, tree traversals), matrix operations, string matching, NP completeness.
C2 : Computing III
Object-oriented programming. Classes, methods, polymorphism, inheritance.
Object-oriented design. C++. UNIX. Ethical and social issues.
f : (C1 , C2 ) → n, n ∈ [0, 1] (1)
C1 is a course from an external institution.
C2 is a course offered at UML.
Slide 34
7. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Knowledge Acquisition Bottleneck
Semantic relatedness measures that rely on a traditional knowledge
base usually suffer the knowledge acquisition bottleneck.
Knowledge acquisition is difficult for an expert
system [HRWL83]:
Representation mismatch: the difference between the way a human
expert states knowledge and the way it is represented in the system.
Knowledge inaccuracy: the difficulty for human experts to describe
knowledge in terms that are precise, complete, and consistent
enough for use in a computer program.
Coverage problem: the difficulty of characterizing all of the relevant
domain knowledge in a given representation system, even when the
expert is able to correctly verbalize the knowledge.
Maintenance trap: the time required to maintain a knowledge base.
8. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Semantic Relatedness
Three terms have been used interchangeably in related literature:
semantic relatedness, semantic similarity, and semantic distance.
Semantic Distance
Semantic Relatedness
Semantic Similarity
Figure : The relations of semantic distance, semantic relatedness, and
semantic similarity [BH06].
9. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Semantic Similarity versus Semantic Relatedness
Semantic Similarity
animal cat close
human cat distant
Semantic Relatedness
cat paw close
cat hand distant
10. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Popular Knowledge Sources
1 Lexicon-based Resources
Dictionaries
Thesauri
WordNet
Cyc
2 Corpus-based Resources
Project Gutenberg
British National Corpus
Penn Treebank
3 Hybrid Resources
Wikipedia
Wikitionary
11. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Related Work on Semantic Relatedness
1 Lexicon-based
Dictionary [KF93]
Thesaurus [MH91]
WordNet [WP94, LC98, HSO98, YP05]
2 Corpus-based
Query Expansion [SH06, BMI07, CV07]
LSA [LFL98]
HAL [BLL98]
PMI-IR [Tur01]
ESA (Wikipedia) [GM07, GM09]
3 Hybrid
Information Content [Res95]
Distributional profiling [Moh06, Moh08]
Li et al. [LBM03, LMB+ 06]
Ponzetto and Strube (Wikipedia) [PS07]
12. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
A Fragment of the WordNet Taxonomy
entity.n.01
physical entity.n.01
❳
❢❢❢❢❢ ❳❳❳❳❳❳❳❳❳
❢❢❢❢❢ ❳❳❳❳❳
❢❢❢❢❢
object.n.01
❳ matter.n.03
❳❳
❢❢❢❢❢ ❳❳❳❳❳❳❳❳❳ ❳❳❳❳❳
❢❢❢❢❢ ❳❳❳❳❳ ❳❳❳❳❳
❢❢❢❢❢ ❳❳❳
part.n.02 whole.n.02 solid.n.01
component.n.03 artifact.n.01 crystal.n.01
crystal.n.02 decoration.n.01 gem.n.02
piezoelectric crystal.n.01 adornment.n.01 transparent gem.n.01
jewelry.n.01
❳ diamond.n.02
❢❢❢❢❢ ❳❳❳❳❳❳❳❳❳
❢❢❢❢❢ ❳❳❳❳❳
❢❢❢❢❢
bracelet.n.02 necklace.n.01
13. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
The First Approach
1 Semantic relatedness between two concepts: based on
their path length and the depth of their common ancestor in
the WordNet taxonomy.
2 Semantic relatedness between two words: based on the
previous step, and includes POS and WSD.
3 Semantic relatedness between two sentences: constructs
two semantic vectors, and takes into account the information
content.
4 Word order similarity (optional): “a dog bites a man” & “a
man bites a dog”
5 Semantic relatedness between paragraphs
6 Semantic relatedness between courses
14. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Concept Relatedness
Path function:
f1 (p) = e−αp (α ∈ [0, 1]) (2)
Depth function:
eβh − e−βh
f2 (h) = (β ∈ [0, 1]) (3)
eβh + e−βh
Semantic relatedness between concepts c1 and c2 :
fword (c1 , c2 ) = f1 (p) · f2 (h) (4)
15. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Semantic Relatedness Between Words
Algorithm 1 Semantic Relatedness Between Words
1: If two words w1 and w2 have different POS, consider them se-
mantically distant. Return 0.
2: If w1 and w2 have the same POS and look the same but do not
exist in WordNet, consider them semantically close. Return 1.
3: Using either maximum scores or the first sense heuristic to per-
form WSD, measure the semantic relatedness between w1 and
w2 using Equation 4 .
4: Using the same WSD strategy as the previous step, measure the
semantic relatedness between the stemmed w1 and the stemmed
w2 using Equation 4 .
5: Return the larger of the two results in steps (3) and (4), i.e.,
the score of the pair that is semantically closer.
16. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Construct a List of Joint Words
To measure the semantic relatedness between sentences S1 and
S2 , first join them into a unique word set S, with a length of n:
S = S1 ∪ S2 = {w1 , w2 , . . . wn }. (5)
S1 : introduction to computer programming
S2 : introduction to computing environments
S: introduction to computer programming computing environments
17. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Construct a Lexical Semantic Vector
Algorithm 2 Lexical Semantic Vector s1 for S1
ˆ
1: for all words wi ∈ S do
2: if wi ∈ S1 , set sˆ = 1 where sˆ ∈ s1 .
1i 1i ˆ
3: if wi ∈ S1 , the semantic relatedness between wi and each
/
word w1j ∈ S1 is calculated using algorithm 1 . Set sˆ to the
1i
highest score if the score exceeds a preset threshold δ (δ ∈
[0, 1]), otherwise sˆ = 0.
1i
4: Let γ ∈ [1, n] be the maximum number of times a word w1j ∈
S1 is chosen as semantically the closest word of wi . Let
the semantic relatedness of wi and w1j be d, and f1j be
the number of times that w1j is chosen. If f1j > γ, set
sˆ = d/f1j to give a penalty to w1j . This step is called
1i
ticketing.
5: end for
18. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
First-level Sentence Relatedness
TF-IDF:
N
T F IDF (wi ) = tfi · idfi = tfi · log (6)
dfi
Semantic vector SV1 for sentence S1 :
SV1i = sˆ ·(T F IDF (wi )+ )·(T F IDF (w1j )+ ),
1i (i ∈ [1, n], j ∈ [1, t])
(7)
19. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
First-level Sentence Relatedness
(1) SV1 · SV2
fsent (S1 , S2 ) = (8)
||SV1 || · ||SV2 ||
20. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Second-level Sentence Relatedness
Word order similarity:
||Q1 − Q2 ||
forder (S1 , S2 ) = 1 − (9)
||Q1 + Q2 ||
Q1 , Q2 : word order vectors of S1 and S2 .
Second-level Sentence Relatedness:
(2) (1)
fsent (S1 , S2 ) = τ ·fsent (S1 , S2 )+(1−τ )·forder (S1 , S2 ), τ ∈ [0, 1]
(10)
21. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Semantic Relatedness Between Paragraphs
n m
i=1 (maxj=1 fsent (s1i , s2j )) · Ni
fpara (P1 , P2 ) = n (11)
i=1 Ni
Algorithm 3 Semantic Relatedness for Paragraphs
1: If deletion is enabled, given two course descriptions, select the one with
fewer sentences as P1 , and the other as P2 . If deletion is disabled,
select the first course description as P1 , and the other as P2 .
2: for each sentence s1i ∈ P1 do
3: Calculate the semantic relatedness between sentences using
equation 10 for s
1i and each of the sentences in P2 .
4: Find the sentence pair s1i , s2j (s2j ∈ P2 ) that scores the highest.
Save the highest score and the total number of words of s1i and
s2j . If deletion is enabled, remove sentence s2j from P2 .
5: end for
6: Collect the highest score and the number of words from each run.
Use their weighted mean from equation 11 as the semantic relatedness
between P1 and P2 .
22. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Semantic Relatedness Between Courses
fcourse (C1 , C2 ) = θ·fsent (T1 , T2 )+(1−θ)·fpara (P1 , P2 ), θ ∈ [0, 1]
(12)
23. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Data sets
Data Sets MCC Courses UML Courses Total
Small 25 24 49
Medium 55 50 105
Large 108 89 197
Table : Number of courses in the data sets
24. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Experimental Results
Compared against the method by Li et al. [LMB+ 06] and
TF-IDF [SB88]:
Accuracy Comparison Average ranks of the real equivalent courses
100 Enable word order Enable word order
Disable word order 20 Disable word order
90 Best case TFIDF TFIDF
Li Li
80
15
70
Average rank
Accuracy
60 10
50
40 5
30 Best case
20 0
49 105 197 49 105 197
Number of documents Number of documents
25. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Experimental Results
Performance of two word sense disambiguation algorithms:
Accuracy Comparison of WSD
100
90 Best case
80
70
Accuracy
60
50
40
30 FIRST SENSE
MAX
20
49 105 197
Number of documents
26. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
What’s Wrong with WordNet?
91.304 Foundations of Computer Science
A survey of the mathematical foundations of Computer Science. Finite
automata and regular languages. Stack Acceptors and Context-Free
Languages. Turing Machines, recursive and recursively enumerable sets.
Decidability. Complexity. This course involves no computer programming.
64 unfiltered words fetched from WordNet
acceptor, adjust, arrange, automaton, basis, batch, bent, calculator, car,
class, complexity, computer, countable, course, determine, dress, even,
finite, fix, foundation, foundation garment, fructify, hardening, imply,
initiation, involve, jell, language, linguistic process, lyric, machine,
mathematical, naturally, necessitate, numerical, path, place, plant,
push-down list, push-down storage, put, recursive, regular, review, rig,
run, science, set, set up, sic, sketch, skill, smokestack, specify, speech,
stack, stage set, surveil, survey, terminology, turing, typeset,
unconstipated, view.
27. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
What’s Wrong with WordNet?
91.304 Foundations of Computer Science
A survey of the mathematical foundations of Computer Science. Finite
automata and regular languages. Stack Acceptors and Context-Free
Languages. Turing Machines, recursive and recursively enumerable sets.
Decidability. Complexity. This course involves no computer programming.
18 articles fetched from Wikipedia using the second approach
Alan Turing, Algorithm, Automata theory, Complexity, Computer,
Computer science, Context-free language, Enumeration, Finite set,
Finite-state machine, Kolmogorov complexity, Language, Machine,
Mathematics, Recursive, Recursive language, Recursively enumerable set,
Set theory.
Slide 33
28. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Growth of Wikipedia and WordNet over the years
Growth of English Wikipedia and WordNet
4000000
Articles in Wikipedia
3500000 Synsets in WordNet
3000000
Article/Synset count
2500000
2000000
1500000
1000000
500000
1992 1996 2000 2004 2008 2012
Year
29. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
WordNet versus Wikipedia
Fragments of WordNet and Wikipedia Taxonomies
WordNet [Root: synset(‘‘technology’’), #depth: 2]
# nodes: 25
Wikipedia [Centroid: ‘‘Category:Technology’’, #steps: 2]
# nodes: 3583
30. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Extract a Lexicographical Hierarchy from Wikipedia
1 Let’s assume the knowledge domain is specified, e.g.,
“Category:Computer science.”
2 Choose its parent as the root, i.e., “Category:Applied
sciences.”
3 Use a depth-limited search to recursively traverse each
subcategory (including subpages) to build a lexicographical
hierarchy with depth D.
31. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Growth of the Hierarchy from Wikipedia
Depth: 3
Depth: 1 Depth: 2 Total Nodes: 64,407
Total Nodes: 72 Total Nodes: 4,249
Growth of the lexicographical hierarchy constructed from Wikipedia, illustrated in
circular trees. A lighter color of the nodes and edges indicates that they are at a
deeper depth in the hierarchy.
32. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Lexicographical Hierarchy constructed from Wikipedia
Depth (D) Number of concepts at this level
1 71
2 4,177
3 60,158
4 177,955
5 494,039
6 1,848,052
Table : Number of concepts for each depth in the “Category:Applied
sciences” hierarchy.
The hierarchy only include 1,534,267 distinct articles, out of
5,329,186 articles in Wikipedia. ⇒ Over 71% Wikipedia
articles are eliminated.
33. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Generate Course Description Features
Algorithm 4 Feature Generation (F ) for Course C
1: Tc ← ∅ (clear terms), Ta ← ∅ (ambiguous terms).
2: Generate all possible n-grams (n ∈ [1, 3]) G from C.
3: Fetch the pages whose titles match any of g ∈ G from Wikipedia redirection
data. For each page pid of term t, Tc ← Tc ∪ {t : pid}.
4: Fetch the pages whose titles match any of g ∈ G from Wikipedia page title
data. If a disambiguation page, include all the terms this page refers to. If a
page pid corresponds to a term t that is not ambiguous, Tc ← Tc ∪{t : pid},
else Ta ← Ta ∪ {t : pid}.
5: For each term ta ∈ Ta , find the disambiguation that is on average most
related using Equation 4 to the set of clear terms. If a page pid of ta is
on average the most related to the terms in Tc , and the relatedness score is
above a threshold δ (δ ∈ [0, 1]), set Tc ← Tc ∪ {ta : pid}. If ta and a clear
term are different senses of the same term, keep the one that is more related
to all the other clear terms.
6: Return clear terms as features.
Slide 27
34. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Example of Course Features
C1 : {1134:“Analysis”, 775:“Algorithm”}
{41985:“Shortest path problem”, 597584:“Tree traversal”, 455770:“Spanning tree”,
18955875:“Tree”, 1134:“Analysis”, 18568:“List of algorithms”,
56054:“Completeness”, 775:“Algorithm”, 144656:“Sorting”, 8519:“Data structure”,
93545:“Structure”, 8560:“Design”, 18985040:“Data”}
C2 : {5213:“Computing”}
{21347364:“Unix”, 289862:“Social”, 9258:“Ethics”, 6111038:“Object-oriented
design”, 5311:“Computer programming”, 72038:“C++”, 27471338:“Object-oriented
programming”, 8560:“Design”}
Slide 6
35. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Lexical Semantic Vector
An algorithm similar to Algorithm 2 is used to determine each
value of an entry of the lexical semantic vector sˆ for features
1i
F1 .
A semantic vector is defined as:
SV1i = sˆ · I(ti ) · I(tj )
1i (13)
36. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Information Content
Information content I(t) of a term t:
I(t) = γ · Ic (t) + (1 − γ) · Il (t). (14)
Category information content Ic (t):
log(siblings(t) + 1)
Ic (t) = 1 − , (15)
log(N )
Linkage information content Il (t):
inlinks(pid) outlinks(pid)
Il (t) = 1 − · , (16)
M AXIN M AXOU T
37. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Determine Course Relatedness
SV1 · SV2
f (C1 , C2 ) = . (17)
||SV1 || · ||SV2 ||
f (T1 , T2 ) · (||FT 1 || + ||FT 2 ||) + f (C1 , C2 ) · (||FC1 || + ||FC2 ||)
f (course1 , course2 ) = +Ω,
||FT 1 || + ||FT 2 || + ||FC1 || + ||FC2 ||
(18)
38. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Experimental Results
Randomly select 25 CS courses from 19 universities that can
be transferred to UML according to the transfer dictionary.
Each transfer course is compared to all 44 CS courses offered
at UML.
The result is considered correct if the real equivalent course at
UML is among the top 3 in the list of highest scores.
Algorithm Accuracy
Proposed approach 72%
Li et al. [LMB+ 06] 52%
TF-IDF 32%
Table : Accuracy of the second approach against those of Li et al., and
TFIDF
39. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Experimental Results
Algorithm Pearson’s correlation p-value
TF-IDF 0.730 2 · 10−6
Li et al. [LMB+ 06] 0.570 0.0006
Proposed approach (Features) 0.845 1.13 · 10−9
Proposed approach (Features + IC) 0.851 6.65 · 10−10
Table : Pearson’s correlation of course relatedness scores with human
judgments.
40. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Sensitivity Test
Testing the Sensitivity of Parameters α, β, and δ
1.0
Pearson Correlation When α Changes (β =0.5, δ =0.2)
0.8
Pearson correlation
0.6
0.4
0.2
0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
α
1.0
Pearson Correlation When β Changes (α =0.2, δ =0.2)
0.8
Pearson correlation
0.6
0.4
0.2
0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
β
1.0
Pearson Correlation When δ Changes (α =0.2, β =0.5)
0.8
Pearson correlation
0.6
0.4
0.2
0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
δ
41. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Summary
Highlight the problem of suggesting transfer course
equivalencies.
Proposes two semantic relatedness measures to tackle the
problem.
A semantic relatedness measure based on traditional
knowledge sources can be adapted.
Wikipedia is a better knowledge source compared to
traditional knowledge sources.
A domain-specific semantic relatedness measure built on top
of Wikipedia suits well for suggesting transfer course
equivalencies.
Provides a human judgment data set over 32 pairs of courses:
http://bit.ly/semcourse.
42. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
Published Literature
Using Semantic Distance to Automatically Suggest Transfer Course
Equivalencies
Beibei Yang and Jesse M. Heines
ACL-HLT 2011: Proceedings of the Sixth Workshop on Innovative
Use of NLP for Building Educational Applications (BEA-6)
Association for Computational Linguistics
Domain-Specific Semantic Relatedness from Wikipedia: Can a
Course be Transferred?
Beibei Yang and Jesse M. Heines
NAACL-HLT 2012 Student Research Workshop
43. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
References
Bibliography I
Alexander Budanitsky and Graeme Hirst.
Evaluating Wordnet-based measures of lexical semantic relatedness.
Computational Linguistics, 32:13–47, 2006.
Curt Burgess, Kay Livesay, and Kevin Lund.
Explorations in context space: words, sentences, discourse.
Discourse Processes, 25:211–257, 1998.
Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka.
Measuring semantic similarity between words using web search engines.
In Proceedings of the 16th International Conference on World Wide Web, pages 757–766, New York, NY,
USA, 2007. ACM.
Rudi L. Cilibrasi and Paul M. B. Vitanyi.
The google similarity distance.
IEEE Transactions on Knowledge and Data Engineering, 19:370–383, 2007.
Evgeniy Gabrilovich and Shaul Markovitch.
Computing semantic relatedness using Wikipedia-based explicit semantic analysis.
In Proceedings of the 20th International Joint Conference on AI, 2007.
Evgeniy Gabrilovich and Shaul Markovitch.
Wikipedia-based semantic interpretation for NLP.
Journal of Artificial Intelligence Research, 34:443–498, 2009.
Frederick Hayes-Roth, Donald A. Waterman, and Douglas B. Lenat.
Building expert systems.
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1983.
44. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
References
Bibliography II
Graeme Hirst and David St-Onge.
WordNet: An electronic lexical database, chapter Lexical chains as representations of context for the
detection and correction of malapropisms, pages 305–332.
The MIT Press, Cambridge, MA, 1998.
Hideki Kozima and Teiji Furugori.
Similarity between words computed by spreading activation on an english dictionary.
In Proceedings of the 6th conference on European chapter of the Association for Computational Linguistics,
EACL ’93, pages 232–239, Stroudsburg, PA, USA, 1993. Association for Computational Linguistics.
Yuhua Li, Zuhair A. Bandar, and David McLean.
An approach for measuring semantic similarity between words using multiple information sources.
IEEE Transactions on Knowledge and Data Engineering, pages 871–882, 2003.
Claudia Leacock and Martin Chodorow.
Combining local context and WordNet similarity for word sense identification, pages 265–283.
The MIT Press, Cambridge, MA, 1998.
Thomas K Landauer, Peter W. Foltz, and Darrell Laham.
An introduction to latent semantic analysis.
Discourse Processes, 25(2-3):259–284, 1998.
Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett.
Sentence similarity based on semantic nets and corpus statistics.
IEEE Transactions on Knowledge and Data Engineering, 18(8):1138–1150, 2006.
45. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
References
Bibliography III
Jane Morris and Graeme Hirst.
Lexical cohesion computed by thesaural relations as an indicator of the structure of text.
Computational Linguistics, 17(1):21–48, March 1991.
Distributional measures of concept-distance: A task-oriented evaluation, Proceedings of the 2006
Conference on Empirical Methods in Natural Language Processing, 2006.
Saif Mohammad.
Measuring Semantic Distance Using Distributional Profiles of Concepts.
PhD thesis, University of Toronto, Toronto, Canada, 2008.
Simone Paolo Ponzetto and Michael Strube.
Knowledge derived from Wikipedia for computing semantic relatedness.
Journal of Artificial Intelligence Research, 30:181–212, October 2007.
Philip Resnik.
Using information content to evaluate semantic similarity in a taxonomy.
In Proceedings of the 14th international joint conference on Artificial intelligence, volume 1 of IJCAI’95,
pages 448–453, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.
Gerard Salton and Christopher Buckley.
Term weighting approaches in automatic text retrieval.
Information Processing and Management, 24:513–523, August 1988.
Mehran Sahami and Timothy D. Heilman.
A web-based kernel function for measuring the similarity of short text snippets.
In Proceedings of the 15th International Conference on the World Wide Web, pages 377–386, New York,
NY, USA, 2006. ACM.
46. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References
References
Bibliography IV
Peter D. Turney.
Mining the web for synonyms: PMI-IR versus LSA on TOEFL.
In Luc De Raedt and Peter A. Flach, editors, ECML, volume 2167 of Lecture Notes in Computer Science,
pages 491–502. Springer, 2001.
Zhibiao Wu and Martha Palmer.
Verb semantics and lexical selection.
In Proceedings 32nd Annual Meeting on Association for Computational Linguistics, pages 133–138, 1994.
Dongqiang Yang and David M. W. Powers.
Measuring semantic similarity in the taxonomy of wordnet.
In Proceedings of the 28th Australasian Conference on Computer Science, volume 38, pages 315–322,
Darlinghurst, Australia, 2005. Australian Computer Society, Inc.