Semantic Relatedness for Evaluation of Course Equivalencies

Introduction Knowledge Sources Related Work First Approach Second Approach Summary References

Semantic Relatedness for Evaluation of Course
Equivalencies
Doctoral Dissertation Defense

Beibei Yang

Department of Computer Science
University of Massachusetts Lowell

July 23, 2012


Outline

1 Introduction
2 Knowledge Sources
3 Related Work
4 First Approach
5 Second Approach
6 Summary


NLP and Education

Many NLP techniques have been adapted to the education ﬁeld for:
automated scoring and evaluation
intelligent tutoring
learner cognition
However, few techniques address the identiﬁcation of transfer
course equivalencies.


Why is it important to suggest transfer course
equivalencies?

National Association for College Admission Counseling, 2010
“. . . less attention is focused on the transfer admission process,
which aﬀects approximately one-third of students beginning at
either a four- or two-year institution during the course of their
postsecondary careers.”

National Center for Education Statistics, 2005
“For students who attained their bachelor’s degrees in 1999–2000,
59.7 percent attended more than one institution during their
undergraduate careers and 32.1 percent transferred at least once.”


UML’s course transfer dictionary


Course descriptions
C1 : Analysis of Algorithms
Discusses basic methods for designing and analyzing eﬃcient algorithms emphasizing
methods used in practice. Topics include sorting, searching, dynamic programming,
greedy algorithms, advanced data structures, graph algorithms (shortest path,
spanning trees, tree traversals), matrix operations, string matching, NP completeness.

C2 : Computing III
Object-oriented programming. Classes, methods, polymorphism, inheritance.
Object-oriented design. C++. UNIX. Ethical and social issues.

f : (C1 , C2 ) → n, n ∈ [0, 1] (1)

C1 is a course from an external institution.
C2 is a course oﬀered at UML.
Slide 34


Knowledge Acquisition Bottleneck

Semantic relatedness measures that rely on a traditional knowledge
base usually suffer the knowledge acquisition bottleneck.

Knowledge acquisition is difficult for an expert
system [HRWL83]:
Representation mismatch: the difference between the way a human
expert states knowledge and the way it is represented in the system.
Knowledge inaccuracy: the difficulty for human experts to describe
knowledge in terms that are precise, complete, and consistent
enough for use in a computer program.
Coverage problem: the difficulty of characterizing all of the relevant
domain knowledge in a given representation system, even when the
expert is able to correctly verbalize the knowledge.
Maintenance trap: the time required to maintain a knowledge base.


Semantic Relatedness
Three terms have been used interchangeably in related literature:
semantic relatedness, semantic similarity, and semantic distance.

Semantic Distance

Semantic Similarity

Figure : The relations of semantic distance, semantic relatedness, and
semantic similarity [BH06].


Semantic Similarity versus Semantic Relatedness

Semantic Similarity
animal cat close

human cat distant

cat paw close

cat hand distant


Popular Knowledge Sources

1 Lexicon-based Resources
Dictionaries
Thesauri
WordNet
Cyc
2 Corpus-based Resources
Project Gutenberg
British National Corpus
Penn Treebank
3 Hybrid Resources
Wikipedia
Wikitionary


Related Work on Semantic Relatedness

1 Lexicon-based
Dictionary [KF93]
Thesaurus [MH91]
WordNet [WP94, LC98, HSO98, YP05]
2 Corpus-based
Query Expansion [SH06, BMI07, CV07]
LSA [LFL98]
HAL [BLL98]
PMI-IR [Tur01]
ESA (Wikipedia) [GM07, GM09]
3 Hybrid
Information Content [Res95]
Distributional proﬁling [Moh06, Moh08]
Li et al. [LBM03, LMB+ 06]
Ponzetto and Strube (Wikipedia) [PS07]


A Fragment of the WordNet Taxonomy
entity.n.01

physical entity.n.01
❳
❢❢❢❢❢ ❳❳❳❳❳❳❳❳❳
❢❢❢❢❢ ❳❳❳❳❳
❢❢❢❢❢
object.n.01
❳ matter.n.03
❳❳
❢❢❢❢❢ ❳❳❳❳❳❳❳❳❳ ❳❳❳❳❳
❢❢❢❢❢ ❳❳❳❳❳ ❳❳❳❳❳
❢❢❢❢❢ ❳❳❳
part.n.02 whole.n.02 solid.n.01

component.n.03 artifact.n.01 crystal.n.01

crystal.n.02 decoration.n.01 gem.n.02

piezoelectric crystal.n.01 adornment.n.01 transparent gem.n.01

jewelry.n.01
❳ diamond.n.02
❢❢❢❢❢ ❳❳❳❳❳❳❳❳❳
❢❢❢❢❢ ❳❳❳❳❳
❢❢❢❢❢
bracelet.n.02 necklace.n.01


The First Approach

1 Semantic relatedness between two concepts: based on
their path length and the depth of their common ancestor in
the WordNet taxonomy.
2 Semantic relatedness between two words: based on the
previous step, and includes POS and WSD.
3 Semantic relatedness between two sentences: constructs
two semantic vectors, and takes into account the information
content.
4 Word order similarity (optional): “a dog bites a man” & “a
man bites a dog”
5 Semantic relatedness between paragraphs
6 Semantic relatedness between courses


Concept Relatedness

Path function:
f1 (p) = e−αp (α ∈ [0, 1]) (2)
Depth function:

eβh − e−βh
f2 (h) = (β ∈ [0, 1]) (3)
eβh + e−βh
Semantic relatedness between concepts c1 and c2 :

fword (c1 , c2 ) = f1 (p) · f2 (h) (4)


Semantic Relatedness Between Words

Algorithm 1 Semantic Relatedness Between Words
1: If two words w1 and w2 have diﬀerent POS, consider them se-
mantically distant. Return 0.
2: If w1 and w2 have the same POS and look the same but do not
exist in WordNet, consider them semantically close. Return 1.
3: Using either maximum scores or the ﬁrst sense heuristic to per-
form WSD, measure the semantic relatedness between w1 and
w2 using Equation 4 .
4: Using the same WSD strategy as the previous step, measure the
semantic relatedness between the stemmed w1 and the stemmed
w2 using Equation 4 .
5: Return the larger of the two results in steps (3) and (4), i.e.,
the score of the pair that is semantically closer.


Construct a List of Joint Words

To measure the semantic relatedness between sentences S1 and
S2 , ﬁrst join them into a unique word set S, with a length of n:

S = S1 ∪ S2 = {w1 , w2 , . . . wn }. (5)

S1 : introduction to computer programming

S2 : introduction to computing environments

S: introduction to computer programming computing environments


Construct a Lexical Semantic Vector

Algorithm 2 Lexical Semantic Vector s1 for S1
ˆ
1: for all words wi ∈ S do
2: if wi ∈ S1 , set sˆ = 1 where sˆ ∈ s1 .
1i 1i ˆ
3: if wi ∈ S1 , the semantic relatedness between wi and each
/
word w1j ∈ S1 is calculated using algorithm 1 . Set sˆ to the
1i
highest score if the score exceeds a preset threshold δ (δ ∈
[0, 1]), otherwise sˆ = 0.
1i
4: Let γ ∈ [1, n] be the maximum number of times a word w1j ∈
S1 is chosen as semantically the closest word of wi . Let
the semantic relatedness of wi and w1j be d, and f1j be
the number of times that w1j is chosen. If f1j > γ, set
sˆ = d/f1j to give a penalty to w1j . This step is called
1i
ticketing.
5: end for


First-level Sentence Relatedness

TF-IDF:
N
T F IDF (wi ) = tfi · idfi = tfi · log (6)
dfi

Semantic vector SV1 for sentence S1 :

SV1i = sˆ ·(T F IDF (wi )+ )·(T F IDF (w1j )+ ),
1i (i ∈ [1, n], j ∈ [1, t])
(7)


First-level Sentence Relatedness

(1) SV1 · SV2
fsent (S1 , S2 ) = (8)
||SV1 || · ||SV2 ||


Second-level Sentence Relatedness

Word order similarity:

||Q1 − Q2 ||
forder (S1 , S2 ) = 1 − (9)
||Q1 + Q2 ||

Q1 , Q2 : word order vectors of S1 and S2 .

Second-level Sentence Relatedness:

(2) (1)
fsent (S1 , S2 ) = τ ·fsent (S1 , S2 )+(1−τ )·forder (S1 , S2 ), τ ∈ [0, 1]
(10)


Semantic Relatedness Between Paragraphs
n m
i=1 (maxj=1 fsent (s1i , s2j )) · Ni
fpara (P1 , P2 ) = n (11)
i=1 Ni
Algorithm 3 Semantic Relatedness for Paragraphs
1: If deletion is enabled, given two course descriptions, select the one with
fewer sentences as P1 , and the other as P2 . If deletion is disabled,
select the ﬁrst course description as P1 , and the other as P2 .
2: for each sentence s1i ∈ P1 do
3: Calculate the semantic relatedness between sentences using
equation 10 for s
1i and each of the sentences in P2 .
4: Find the sentence pair s1i , s2j (s2j ∈ P2 ) that scores the highest.
Save the highest score and the total number of words of s1i and
s2j . If deletion is enabled, remove sentence s2j from P2 .
5: end for
6: Collect the highest score and the number of words from each run.
Use their weighted mean from equation 11 as the semantic relatedness
between P1 and P2 .


Semantic Relatedness Between Courses

fcourse (C1 , C2 ) = θ·fsent (T1 , T2 )+(1−θ)·fpara (P1 , P2 ), θ ∈ [0, 1]
(12)


Data sets

Data Sets MCC Courses UML Courses Total
Small 25 24 49
Medium 55 50 105
Large 108 89 197
Table : Number of courses in the data sets


Experimental Results

Compared against the method by Li et al. [LMB+ 06] and
TF-IDF [SB88]:
Accuracy Comparison Average ranks of the real equivalent courses
100 Enable word order Enable word order
Disable word order 20 Disable word order
90 Best case TFIDF TFIDF
Li Li
80
15

70

Average rank
Accuracy

60 10

50

40 5

30 Best case

20 0
49 105 197 49 105 197
Number of documents Number of documents



Performance of two word sense disambiguation algorithms:
Accuracy Comparison of WSD
100

90 Best case

80

70
Accuracy

60

50

40

30 FIRST SENSE
MAX
20
49 105 197
Number of documents


What’s Wrong with WordNet?

91.304 Foundations of Computer Science
A survey of the mathematical foundations of Computer Science. Finite
automata and regular languages. Stack Acceptors and Context-Free
Languages. Turing Machines, recursive and recursively enumerable sets.
Decidability. Complexity. This course involves no computer programming.

64 unfiltered words fetched from WordNet
acceptor, adjust, arrange, automaton, basis, batch, bent, calculator, car,
class, complexity, computer, countable, course, determine, dress, even,
finite, fix, foundation, foundation garment, fructify, hardening, imply,
initiation, involve, jell, language, linguistic process, lyric, machine,
mathematical, naturally, necessitate, numerical, path, place, plant,
push-down list, push-down storage, put, recursive, regular, review, rig,
run, science, set, set up, sic, sketch, skill, smokestack, specify, speech,
stack, stage set, surveil, survey, terminology, turing, typeset,
unconstipated, view.


What’s Wrong with WordNet?

91.304 Foundations of Computer Science
A survey of the mathematical foundations of Computer Science. Finite
automata and regular languages. Stack Acceptors and Context-Free
Languages. Turing Machines, recursive and recursively enumerable sets.
Decidability. Complexity. This course involves no computer programming.

18 articles fetched from Wikipedia using the second approach
Alan Turing, Algorithm, Automata theory, Complexity, Computer,
Computer science, Context-free language, Enumeration, Finite set,
Finite-state machine, Kolmogorov complexity, Language, Machine,
Mathematics, Recursive, Recursive language, Recursively enumerable set,
Set theory.

Slide 33


Growth of Wikipedia and WordNet over the years

Growth of English Wikipedia and WordNet
4000000
Articles in Wikipedia
3500000 Synsets in WordNet
3000000
Article/Synset count

2500000

2000000

1500000

1000000

500000

1992 1996 2000 2004 2008 2012
Year


WordNet versus Wikipedia
Fragments of WordNet and Wikipedia Taxonomies
WordNet [Root: synset(‘‘technology’’), #depth: 2]

# nodes: 25

Wikipedia [Centroid: ‘‘Category:Technology’’, #steps: 2]

# nodes: 3583


Extract a Lexicographical Hierarchy from Wikipedia
1 Let’s assume the knowledge domain is speciﬁed, e.g.,
“Category:Computer science.”
2 Choose its parent as the root, i.e., “Category:Applied
sciences.”
3 Use a depth-limited search to recursively traverse each
subcategory (including subpages) to build a lexicographical
hierarchy with depth D.


Growth of the Hierarchy from Wikipedia

Depth: 3
Depth: 1 Depth: 2 Total Nodes: 64,407
Total Nodes: 72 Total Nodes: 4,249

Growth of the lexicographical hierarchy constructed from Wikipedia, illustrated in
circular trees. A lighter color of the nodes and edges indicates that they are at a
deeper depth in the hierarchy.


Lexicographical Hierarchy constructed from Wikipedia

Depth (D) Number of concepts at this level
1 71
2 4,177
3 60,158
4 177,955
5 494,039
6 1,848,052
Table : Number of concepts for each depth in the “Category:Applied
sciences” hierarchy.

The hierarchy only include 1,534,267 distinct articles, out of
5,329,186 articles in Wikipedia. ⇒ Over 71% Wikipedia
articles are eliminated.


Generate Course Description Features

Algorithm 4 Feature Generation (F ) for Course C
1: Tc ← ∅ (clear terms), Ta ← ∅ (ambiguous terms).
2: Generate all possible n-grams (n ∈ [1, 3]) G from C.
3: Fetch the pages whose titles match any of g ∈ G from Wikipedia redirection
data. For each page pid of term t, Tc ← Tc ∪ {t : pid}.
4: Fetch the pages whose titles match any of g ∈ G from Wikipedia page title
data. If a disambiguation page, include all the terms this page refers to. If a
page pid corresponds to a term t that is not ambiguous, Tc ← Tc ∪{t : pid},
else Ta ← Ta ∪ {t : pid}.
5: For each term ta ∈ Ta , ﬁnd the disambiguation that is on average most
related using Equation 4 to the set of clear terms. If a page pid of ta is
on average the most related to the terms in Tc , and the relatedness score is
above a threshold δ (δ ∈ [0, 1]), set Tc ← Tc ∪ {ta : pid}. If ta and a clear
term are diﬀerent senses of the same term, keep the one that is more related
to all the other clear terms.
6: Return clear terms as features.
Slide 27


Example of Course Features

C1 : {1134:“Analysis”, 775:“Algorithm”}
{41985:“Shortest path problem”, 597584:“Tree traversal”, 455770:“Spanning tree”,
18955875:“Tree”, 1134:“Analysis”, 18568:“List of algorithms”,
56054:“Completeness”, 775:“Algorithm”, 144656:“Sorting”, 8519:“Data structure”,
93545:“Structure”, 8560:“Design”, 18985040:“Data”}

C2 : {5213:“Computing”}
{21347364:“Unix”, 289862:“Social”, 9258:“Ethics”, 6111038:“Object-oriented
design”, 5311:“Computer programming”, 72038:“C++”, 27471338:“Object-oriented
programming”, 8560:“Design”}

Slide 6


Lexical Semantic Vector

An algorithm similar to Algorithm 2 is used to determine each
value of an entry of the lexical semantic vector sˆ for features
1i
F1 .
A semantic vector is deﬁned as:

SV1i = sˆ · I(ti ) · I(tj )
1i (13)


Information Content

Information content I(t) of a term t:

I(t) = γ · Ic (t) + (1 − γ) · Il (t). (14)
Category information content Ic (t):

log(siblings(t) + 1)
Ic (t) = 1 − , (15)
log(N )
Linkage information content Il (t):

inlinks(pid) outlinks(pid)
Il (t) = 1 − · , (16)
M AXIN M AXOU T


Determine Course Relatedness

SV1 · SV2
f (C1 , C2 ) = . (17)
||SV1 || · ||SV2 ||

f (T1 , T2 ) · (||FT 1 || + ||FT 2 ||) + f (C1 , C2 ) · (||FC1 || + ||FC2 ||)
f (course1 , course2 ) = +Ω,
||FT 1 || + ||FT 2 || + ||FC1 || + ||FC2 ||
(18)



Randomly select 25 CS courses from 19 universities that can
be transferred to UML according to the transfer dictionary.
Each transfer course is compared to all 44 CS courses oﬀered
at UML.
The result is considered correct if the real equivalent course at
UML is among the top 3 in the list of highest scores.

Algorithm Accuracy
Proposed approach 72%
Li et al. [LMB+ 06] 52%
TF-IDF 32%
Table : Accuracy of the second approach against those of Li et al., and
TFIDF



Algorithm Pearson’s correlation p-value
TF-IDF 0.730 2 · 10−6
Li et al. [LMB+ 06] 0.570 0.0006
Proposed approach (Features) 0.845 1.13 · 10−9
Proposed approach (Features + IC) 0.851 6.65 · 10−10
Table : Pearson’s correlation of course relatedness scores with human
judgments.


Sensitivity Test
Testing the Sensitivity of Parameters α, β, and δ
1.0
Pearson Correlation When α Changes (β =0.5, δ =0.2)

0.8

Pearson correlation
0.6

0.4

0.2

0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
α

1.0
Pearson Correlation When β Changes (α =0.2, δ =0.2)

0.8
Pearson correlation

0.6

0.4

0.2

0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
β

1.0
Pearson Correlation When δ Changes (α =0.2, β =0.5)

0.8
Pearson correlation

0.6

0.4

0.2

0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
δ


Summary

Highlight the problem of suggesting transfer course
equivalencies.
Proposes two semantic relatedness measures to tackle the
problem.
A semantic relatedness measure based on traditional
knowledge sources can be adapted.
Wikipedia is a better knowledge source compared to
traditional knowledge sources.
A domain-speciﬁc semantic relatedness measure built on top
of Wikipedia suits well for suggesting transfer course
equivalencies.
Provides a human judgment data set over 32 pairs of courses:
http://bit.ly/semcourse.


Published Literature

Using Semantic Distance to Automatically Suggest Transfer Course
Equivalencies
Beibei Yang and Jesse M. Heines
ACL-HLT 2011: Proceedings of the Sixth Workshop on Innovative
Use of NLP for Building Educational Applications (BEA-6)
Association for Computational Linguistics
Domain-Speciﬁc Semantic Relatedness from Wikipedia: Can a
Course be Transferred?
Beibei Yang and Jesse M. Heines
NAACL-HLT 2012 Student Research Workshop


References

Bibliography I
Alexander Budanitsky and Graeme Hirst.
Evaluating Wordnet-based measures of lexical semantic relatedness.
Computational Linguistics, 32:13–47, 2006.

Curt Burgess, Kay Livesay, and Kevin Lund.
Explorations in context space: words, sentences, discourse.
Discourse Processes, 25:211–257, 1998.

Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka.
Measuring semantic similarity between words using web search engines.
In Proceedings of the 16th International Conference on World Wide Web, pages 757–766, New York, NY,
USA, 2007. ACM.

Rudi L. Cilibrasi and Paul M. B. Vitanyi.
The google similarity distance.
IEEE Transactions on Knowledge and Data Engineering, 19:370–383, 2007.

Evgeniy Gabrilovich and Shaul Markovitch.
Computing semantic relatedness using Wikipedia-based explicit semantic analysis.
In Proceedings of the 20th International Joint Conference on AI, 2007.

Evgeniy Gabrilovich and Shaul Markovitch.
Wikipedia-based semantic interpretation for NLP.
Journal of Artiﬁcial Intelligence Research, 34:443–498, 2009.

Frederick Hayes-Roth, Donald A. Waterman, and Douglas B. Lenat.
Building expert systems.
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1983.


References

Bibliography II

Graeme Hirst and David St-Onge.
WordNet: An electronic lexical database, chapter Lexical chains as representations of context for the
detection and correction of malapropisms, pages 305–332.
The MIT Press, Cambridge, MA, 1998.

Hideki Kozima and Teiji Furugori.
Similarity between words computed by spreading activation on an english dictionary.
In Proceedings of the 6th conference on European chapter of the Association for Computational Linguistics,
EACL ’93, pages 232–239, Stroudsburg, PA, USA, 1993. Association for Computational Linguistics.

Yuhua Li, Zuhair A. Bandar, and David McLean.
An approach for measuring semantic similarity between words using multiple information sources.
IEEE Transactions on Knowledge and Data Engineering, pages 871–882, 2003.

Claudia Leacock and Martin Chodorow.
Combining local context and WordNet similarity for word sense identiﬁcation, pages 265–283.
The MIT Press, Cambridge, MA, 1998.

Thomas K Landauer, Peter W. Foltz, and Darrell Laham.
An introduction to latent semantic analysis.
Discourse Processes, 25(2-3):259–284, 1998.

Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett.
Sentence similarity based on semantic nets and corpus statistics.
IEEE Transactions on Knowledge and Data Engineering, 18(8):1138–1150, 2006.


References

Bibliography III
Jane Morris and Graeme Hirst.
Lexical cohesion computed by thesaural relations as an indicator of the structure of text.
Computational Linguistics, 17(1):21–48, March 1991.

Distributional measures of concept-distance: A task-oriented evaluation, Proceedings of the 2006
Conference on Empirical Methods in Natural Language Processing, 2006.

Saif Mohammad.
Measuring Semantic Distance Using Distributional Profiles of Concepts.
PhD thesis, University of Toronto, Toronto, Canada, 2008.

Simone Paolo Ponzetto and Michael Strube.
Knowledge derived from Wikipedia for computing semantic relatedness.
Journal of Artificial Intelligence Research, 30:181–212, October 2007.

Philip Resnik.
Using information content to evaluate semantic similarity in a taxonomy.
In Proceedings of the 14th international joint conference on Artificial intelligence, volume 1 of IJCAI’95,
pages 448–453, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.

Gerard Salton and Christopher Buckley.
Term weighting approaches in automatic text retrieval.
Information Processing and Management, 24:513–523, August 1988.

Mehran Sahami and Timothy D. Heilman.
A web-based kernel function for measuring the similarity of short text snippets.
In Proceedings of the 15th International Conference on the World Wide Web, pages 377–386, New York,
NY, USA, 2006. ACM.


References

Bibliography IV

Peter D. Turney.
Mining the web for synonyms: PMI-IR versus LSA on TOEFL.
In Luc De Raedt and Peter A. Flach, editors, ECML, volume 2167 of Lecture Notes in Computer Science,
pages 491–502. Springer, 2001.

Zhibiao Wu and Martha Palmer.
Verb semantics and lexical selection.
In Proceedings 32nd Annual Meeting on Association for Computational Linguistics, pages 133–138, 1994.

Dongqiang Yang and David M. W. Powers.
Measuring semantic similarity in the taxonomy of wordnet.
In Proceedings of the 28th Australasian Conference on Computer Science, volume 38, pages 315–322,
Darlinghurst, Australia, 2005. Australian Computer Society, Inc.

Semantic Relatedness for Evaluation of Course Equivalencies

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Viewers also liked

Viewers also liked (17)

Similar to Semantic Relatedness for Evaluation of Course Equivalencies

Similar to Semantic Relatedness for Evaluation of Course Equivalencies (20)

More from Beibei Yang

More from Beibei Yang (6)

Recently uploaded

Recently uploaded (20)

Semantic Relatedness for Evaluation of Course Equivalencies