SlideShare a Scribd company logo
Introduction    Knowledge Sources   Related Work     First Approach   Second Approach   Summary   References

               Semantic Relatedness for Evaluation of Course
                                    Doctoral Dissertation Defense

                                                   Beibei Yang

                                      Department of Computer Science
                                      University of Massachusetts Lowell

                                                   July 23, 2012
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References


          1    Introduction
          2    Knowledge Sources
          3    Related Work
          4    First Approach
          5    Second Approach
          6    Summary
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

NLP and Education

       Many NLP techniques have been adapted to the education field for:
               automated scoring and evaluation
               intelligent tutoring
               learner cognition
       However, few techniques address the identification of transfer
       course equivalencies.
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

Why is it important to suggest transfer course

       National Association for College Admission Counseling, 2010
       “. . . less attention is focused on the transfer admission process,
       which affects approximately one-third of students beginning at
       either a four- or two-year institution during the course of their
       postsecondary careers.”

       National Center for Education Statistics, 2005
       “For students who attained their bachelor’s degrees in 1999–2000,
       59.7 percent attended more than one institution during their
       undergraduate careers and 32.1 percent transferred at least once.”
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

UML’s course transfer dictionary
Introduction   Knowledge Sources   Related Work   First Approach      Second Approach   Summary    References

Course descriptions
       C1 : Analysis of Algorithms
       Discusses basic methods for designing and analyzing efficient algorithms emphasizing
       methods used in practice. Topics include sorting, searching, dynamic programming,
       greedy algorithms, advanced data structures, graph algorithms (shortest path,
       spanning trees, tree traversals), matrix operations, string matching, NP completeness.

       C2 : Computing III
       Object-oriented programming. Classes, methods, polymorphism, inheritance.
       Object-oriented design. C++. UNIX. Ethical and social issues.

                                   f : (C1 , C2 ) → n,             n ∈ [0, 1]                     (1)

               C1 is a course from an external institution.
               C2 is a course offered at UML.
                                                                                             Slide 34
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

Knowledge Acquisition Bottleneck

       Semantic relatedness measures that rely on a traditional knowledge
       base usually suffer the knowledge acquisition bottleneck.

       Knowledge acquisition is difficult for an expert
       system [HRWL83]:
               Representation mismatch: the difference between the way a human
               expert states knowledge and the way it is represented in the system.
               Knowledge inaccuracy: the difficulty for human experts to describe
               knowledge in terms that are precise, complete, and consistent
               enough for use in a computer program.
               Coverage problem: the difficulty of characterizing all of the relevant
               domain knowledge in a given representation system, even when the
               expert is able to correctly verbalize the knowledge.
               Maintenance trap: the time required to maintain a knowledge base.
Introduction   Knowledge Sources   Related Work     First Approach      Second Approach   Summary   References

Semantic Relatedness
       Three terms have been used interchangeably in related literature:
       semantic relatedness, semantic similarity, and semantic distance.

                                                               Semantic Distance
                                                      Semantic Relatedness

                                                  Semantic Similarity

       Figure : The relations of semantic distance, semantic relatedness, and
       semantic similarity [BH06].
Introduction   Knowledge Sources   Related Work   First Approach      Second Approach   Summary   References

Semantic Similarity versus Semantic Relatedness

       Semantic Similarity
                        animal                                     cat close

                  human                                                     cat distant

       Semantic Relatedness
                        cat                                   paw     close

                   cat                                                  hand       distant
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

Popular Knowledge Sources

          1    Lexicon-based Resources
          2    Corpus-based Resources
                     Project Gutenberg
                     British National Corpus
                     Penn Treebank
          3    Hybrid Resources
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

Related Work on Semantic Relatedness

          1    Lexicon-based
                     Dictionary [KF93]
                     Thesaurus [MH91]
                     WordNet [WP94, LC98, HSO98, YP05]
          2    Corpus-based
                     Query Expansion [SH06, BMI07, CV07]
                     LSA [LFL98]
                     HAL [BLL98]
                     PMI-IR [Tur01]
                     ESA (Wikipedia) [GM07, GM09]
          3    Hybrid
                     Information Content [Res95]
                     Distributional profiling [Moh06, Moh08]
                     Li et al. [LBM03, LMB+ 06]
                     Ponzetto and Strube (Wikipedia) [PS07]
Introduction     Knowledge Sources   Related Work    First Approach       Second Approach      Summary   References

A Fragment of the WordNet Taxonomy

                                                physical entity.n.01
                                                      ❢❢❢❢❢ ❳❳❳❳❳❳❳❳❳
                                                 ❢❢❢❢❢               ❳❳❳❳❳
                                            ❳                         matter.n.03
                                       ❢❢❢❢❢ ❳❳❳❳❳❳❳❳❳                       ❳❳❳❳❳
                                  ❢❢❢❢❢               ❳❳❳❳❳                       ❳❳❳❳❳
                             ❢❢❢❢❢                                                     ❳❳❳
                        part.n.02                     whole.n.02                       solid.n.01

                   component.n.03                    artifact.n.01                    crystal.n.01

                      crystal.n.02                  decoration.n.01                    gem.n.02

               piezoelectric crystal.n.01           adornment.n.01            transparent gem.n.01

                                                           ❳                        diamond.n.02
                                                      ❢❢❢❢❢ ❳❳❳❳❳❳❳❳❳
                                                 ❢❢❢❢❢               ❳❳❳❳❳
                                     bracelet.n.02                    necklace.n.01
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

The First Approach

          1    Semantic relatedness between two concepts: based on
               their path length and the depth of their common ancestor in
               the WordNet taxonomy.
          2    Semantic relatedness between two words: based on the
               previous step, and includes POS and WSD.
          3    Semantic relatedness between two sentences: constructs
               two semantic vectors, and takes into account the information
          4    Word order similarity (optional): “a dog bites a man” & “a
               man bites a dog”
          5    Semantic relatedness between paragraphs
          6    Semantic relatedness between courses
Introduction   Knowledge Sources    Related Work   First Approach      Second Approach   Summary    References

Concept Relatedness

       Path function:
                                      f1 (p) = e−αp           (α ∈ [0, 1])                         (2)
       Depth function:

                                              eβh − e−βh
                                   f2 (h) =                         (β ∈ [0, 1])                   (3)
                                              eβh + e−βh
       Semantic relatedness between concepts c1 and c2 :

                                     fword (c1 , c2 ) = f1 (p) · f2 (h)                            (4)
Introduction    Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

Semantic Relatedness Between Words

       Algorithm 1 Semantic Relatedness Between Words
         1:    If two words w1 and w2 have different POS, consider them se-
               mantically distant. Return 0.
         2:    If w1 and w2 have the same POS and look the same but do not
               exist in WordNet, consider them semantically close. Return 1.
         3:    Using either maximum scores or the first sense heuristic to per-
               form WSD, measure the semantic relatedness between w1 and
               w2 using Equation 4 .
         4:    Using the same WSD strategy as the previous step, measure the
               semantic relatedness between the stemmed w1 and the stemmed
               w2 using Equation 4 .
         5:    Return the larger of the two results in steps (3) and (4), i.e.,
               the score of the pair that is semantically closer.
Introduction   Knowledge Sources        Related Work   First Approach     Second Approach      Summary     References

Construct a List of Joint Words

       To measure the semantic relatedness between sentences S1 and
       S2 , first join them into a unique word set S, with a length of n:

                                    S = S1 ∪ S2 = {w1 , w2 , . . . wn }.                                 (5)

       S1 :    introduction        to      computer       programming

       S2 :    introduction        to      computing       environments

       S:      introduction        to      computer       programming      computing        environments
Introduction    Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

Construct a Lexical Semantic Vector

       Algorithm 2 Lexical Semantic Vector s1 for S1
         1:    for all words wi ∈ S do
         2:      if wi ∈ S1 , set sˆ = 1 where sˆ ∈ s1 .
                                   1i           1i   ˆ
         3:      if wi ∈ S1 , the semantic relatedness between wi and each
                 word w1j ∈ S1 is calculated using algorithm 1 . Set sˆ to the
                 highest score if the score exceeds a preset threshold δ (δ ∈
                 [0, 1]), otherwise sˆ = 0.
         4:      Let γ ∈ [1, n] be the maximum number of times a word w1j ∈
                 S1 is chosen as semantically the closest word of wi . Let
                 the semantic relatedness of wi and w1j be d, and f1j be
                 the number of times that w1j is chosen. If f1j > γ, set
                 sˆ = d/f1j to give a penalty to w1j . This step is called
         5:    end for
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach    Summary    References

First-level Sentence Relatedness

                            T F IDF (wi ) = tfi · idfi = tfi · log                              (6)

       Semantic vector SV1 for sentence S1 :

       SV1i = sˆ ·(T F IDF (wi )+ )·(T F IDF (w1j )+ ),
               1i                                                               (i ∈ [1, n], j ∈ [1, t])
Introduction   Knowledge Sources    Related Work   First Approach     Second Approach   Summary    References

First-level Sentence Relatedness

                                    (1)                       SV1 · SV2
                                   fsent (S1 , S2 ) =                                             (8)
                                                           ||SV1 || · ||SV2 ||
Introduction   Knowledge Sources    Related Work   First Approach       Second Approach   Summary    References

Second-level Sentence Relatedness

       Word order similarity:

                                                                    ||Q1 − Q2 ||
                                   forder (S1 , S2 ) = 1 −                                          (9)
                                                                    ||Q1 + Q2 ||

                          Q1 , Q2 : word order vectors of S1 and S2 .

       Second-level Sentence Relatedness:

         (2)                         (1)
       fsent (S1 , S2 ) = τ ·fsent (S1 , S2 )+(1−τ )·forder (S1 , S2 ),                   τ ∈ [0, 1]
Introduction    Knowledge Sources   Related Work    First Approach       Second Approach        Summary     References

Semantic Relatedness Between Paragraphs
                                                   n       m
                                                   i=1 (maxj=1       fsent (s1i , s2j )) · Ni
                       fpara (P1 , P2 ) =                            n                                    (11)
                                                                     i=1 Ni
       Algorithm 3 Semantic Relatedness for Paragraphs
         1: If deletion is enabled, given two course descriptions, select the one with
               fewer sentences as P1 , and the other as P2 . If deletion is disabled,
               select the first course description as P1 , and the other as P2 .
         2:    for each sentence s1i ∈ P1 do
         3:       Calculate the semantic relatedness between sentences using
                    equation 10 for s
                                      1i and each of the sentences in P2 .
         4:       Find the sentence pair s1i , s2j (s2j ∈ P2 ) that scores the highest.
                  Save the highest score and the total number of words of s1i and
                  s2j . If deletion is enabled, remove sentence s2j from P2 .
         5:    end for
         6:    Collect the highest score and the number of words from each run.
               Use their weighted mean from equation 11 as the semantic relatedness
               between P1 and P2 .
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

Semantic Relatedness Between Courses

       fcourse (C1 , C2 ) = θ·fsent (T1 , T2 )+(1−θ)·fpara (P1 , P2 ),               θ ∈ [0, 1]
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach      Summary   References

Data sets

                   Data Sets          MCC Courses              UML Courses           Total
                   Small              25                       24                    49
                   Medium             55                       50                    105
                   Large              108                      89                    197
                            Table : Number of courses in the data sets
Introduction                Knowledge Sources       Related Work     First Approach                        Second Approach            Summary        References

Experimental Results

                  Compared against the method by Li et al. [LMB+ 06] and
                  TF-IDF [SB88]:
                                  Accuracy Comparison                                                        Average ranks of the real equivalent courses
           100                                       Enable word order                                         Enable word order
                                                     Disable word order                          20            Disable word order
           90         Best case                      TFIDF                                                     TFIDF
                                                     Li                                                        Li


                                                                                  Average rank

           60                                                                                    10


           40                                                                                     5

           30                                                                                                                                    Best case

           20                                                                                    0
                 49                 105                               197                             49                  105                                197
                                   Number of documents                                                                   Number of documents
Introduction   Knowledge Sources                 Related Work      First Approach           Second Approach   Summary   References

Experimental Results

       Performance of two word sense disambiguation algorithms:
                                                               Accuracy Comparison of WSD

                                      90           Best case






                                      30                FIRST SENSE
                                            49                      105                               197
                                                                   Number of documents
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

What’s Wrong with WordNet?

       91.304 Foundations of Computer Science
       A survey of the mathematical foundations of Computer Science. Finite
       automata and regular languages. Stack Acceptors and Context-Free
       Languages. Turing Machines, recursive and recursively enumerable sets.
       Decidability. Complexity. This course involves no computer programming.

       64 unfiltered words fetched from WordNet
       acceptor, adjust, arrange, automaton, basis, batch, bent, calculator, car,
       class, complexity, computer, countable, course, determine, dress, even,
       finite, fix, foundation, foundation garment, fructify, hardening, imply,
       initiation, involve, jell, language, linguistic process, lyric, machine,
       mathematical, naturally, necessitate, numerical, path, place, plant,
       push-down list, push-down storage, put, recursive, regular, review, rig,
       run, science, set, set up, sic, sketch, skill, smokestack, specify, speech,
       stack, stage set, surveil, survey, terminology, turing, typeset,
       unconstipated, view.
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary    References

What’s Wrong with WordNet?

       91.304 Foundations of Computer Science
       A survey of the mathematical foundations of Computer Science. Finite
       automata and regular languages. Stack Acceptors and Context-Free
       Languages. Turing Machines, recursive and recursively enumerable sets.
       Decidability. Complexity. This course involves no computer programming.

       18 articles fetched from Wikipedia using the second approach
       Alan Turing, Algorithm, Automata theory, Complexity, Computer,
       Computer science, Context-free language, Enumeration, Finite set,
       Finite-state machine, Kolmogorov complexity, Language, Machine,
       Mathematics, Recursive, Recursive language, Recursively enumerable set,
       Set theory.

                                                                                          Slide 33
Introduction              Knowledge Sources       Related Work   First Approach       Second Approach   Summary   References

Growth of Wikipedia and WordNet over the years

                                                  Growth of English Wikipedia and WordNet
                                                      Articles in Wikipedia
                                      3500000         Synsets in WordNet
               Article/Synset count






                                           1992     1996          2000             2004        2008      2012
Introduction   Knowledge Sources     Related Work      First Approach         Second Approach   Summary   References

WordNet versus Wikipedia
                                   Fragments of WordNet and Wikipedia Taxonomies
                                      WordNet [Root: synset(‘‘technology’’), #depth: 2]

                             # nodes: 25

                                   Wikipedia [Centroid: ‘‘Category:Technology’’, #steps: 2]

                             # nodes: 3583
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

Extract a Lexicographical Hierarchy from Wikipedia
          1    Let’s assume the knowledge domain is specified, e.g.,
               “Category:Computer science.”
          2    Choose its parent as the root, i.e., “Category:Applied
          3    Use a depth-limited search to recursively traverse each
               subcategory (including subpages) to build a lexicographical
               hierarchy with depth D.
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

Growth of the Hierarchy from Wikipedia

                                                                       Depth: 3
      Depth: 1                     Depth: 2                            Total Nodes: 64,407
      Total Nodes: 72              Total Nodes: 4,249

       Growth of the lexicographical hierarchy constructed from Wikipedia, illustrated in
       circular trees. A lighter color of the nodes and edges indicates that they are at a
       deeper depth in the hierarchy.
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

Lexicographical Hierarchy constructed from Wikipedia

                       Depth (D)          Number of concepts at this level
                           1              71
                           2              4,177
                           3              60,158
                           4              177,955
                           5              494,039
                           6              1,848,052
       Table : Number of concepts for each depth in the “Category:Applied
       sciences” hierarchy.

               The hierarchy only include 1,534,267 distinct articles, out of
               5,329,186 articles in Wikipedia. ⇒ Over 71% Wikipedia
               articles are eliminated.
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary    References

Generate Course Description Features

       Algorithm 4 Feature Generation (F ) for Course C
        1: Tc ← ∅ (clear terms), Ta ← ∅ (ambiguous terms).
        2: Generate all possible n-grams (n ∈ [1, 3]) G from C.
        3: Fetch the pages whose titles match any of g ∈ G from Wikipedia redirection
           data. For each page pid of term t, Tc ← Tc ∪ {t : pid}.
        4: Fetch the pages whose titles match any of g ∈ G from Wikipedia page title
           data. If a disambiguation page, include all the terms this page refers to. If a
           page pid corresponds to a term t that is not ambiguous, Tc ← Tc ∪{t : pid},
           else Ta ← Ta ∪ {t : pid}.
        5: For each term ta ∈ Ta , find the disambiguation that is on average most
           related using Equation 4 to the set of clear terms. If a page pid of ta is
           on average the most related to the terms in Tc , and the relatedness score is
           above a threshold δ (δ ∈ [0, 1]), set Tc ← Tc ∪ {ta : pid}. If ta and a clear
           term are different senses of the same term, keep the one that is more related
           to all the other clear terms.
        6: Return clear terms as features.
                                                                                          Slide 27
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary    References

Example of Course Features

       C1 : {1134:“Analysis”, 775:“Algorithm”}
       {41985:“Shortest path problem”, 597584:“Tree traversal”, 455770:“Spanning tree”,
       18955875:“Tree”, 1134:“Analysis”, 18568:“List of algorithms”,
       56054:“Completeness”, 775:“Algorithm”, 144656:“Sorting”, 8519:“Data structure”,
       93545:“Structure”, 8560:“Design”, 18985040:“Data”}

       C2 : {5213:“Computing”}
       {21347364:“Unix”, 289862:“Social”, 9258:“Ethics”, 6111038:“Object-oriented
       design”, 5311:“Computer programming”, 72038:“C++”, 27471338:“Object-oriented
       programming”, 8560:“Design”}

                                                                                           Slide 6
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

Lexical Semantic Vector

               An algorithm similar to Algorithm 2 is used to determine each
               value of an entry of the lexical semantic vector sˆ for features
               F1 .
               A semantic vector is defined as:

                                           SV1i = sˆ · I(ti ) · I(tj )
                                                   1i                                      (13)
Introduction   Knowledge Sources   Related Work    First Approach   Second Approach   Summary   References

Information Content

               Information content I(t) of a term t:

                                    I(t) = γ · Ic (t) + (1 − γ) · Il (t).                   (14)
               Category information content Ic (t):

                                                       log(siblings(t) + 1)
                                   Ic (t) = 1 −                             ,               (15)
                                                             log(N )
               Linkage information content Il (t):

                                                  inlinks(pid) outlinks(pid)
                             Il (t) = 1 −                     ·              ,              (16)
                                                   M AXIN       M AXOU T
Introduction   Knowledge Sources    Related Work    First Approach         Second Approach      Summary     References

Determine Course Relatedness

                                                             SV1 · SV2
                                         f (C1 , C2 ) =                       .                           (17)
                                                          ||SV1 || · ||SV2 ||

                                   f (T1 , T2 ) · (||FT 1 || + ||FT 2 ||) + f (C1 , C2 ) · (||FC1 || + ||FC2 ||)
       f (course1 , course2 ) =                                                                                  +Ω,
                                                     ||FT 1 || + ||FT 2 || + ||FC1 || + ||FC2 ||
Introduction   Knowledge Sources   Related Work   First Approach      Second Approach   Summary   References

Experimental Results

               Randomly select 25 CS courses from 19 universities that can
               be transferred to UML according to the transfer dictionary.
               Each transfer course is compared to all 44 CS courses offered
               at UML.
               The result is considered correct if the real equivalent course at
               UML is among the top 3 in the list of highest scores.

                                    Algorithm                      Accuracy
                                    Proposed approach              72%
                                    Li et al. [LMB+ 06]            52%
                                    TF-IDF                         32%
       Table : Accuracy of the second approach against those of Li et al., and
Introduction   Knowledge Sources   Related Work   First Approach    Second Approach   Summary   References

Experimental Results

         Algorithm                                            Pearson’s correlation    p-value
         TF-IDF                                               0.730                    2 · 10−6
         Li et al. [LMB+ 06]                                  0.570                    0.0006
         Proposed approach (Features)                         0.845                    1.13 · 10−9
         Proposed approach (Features + IC)                    0.851                    6.65 · 10−10
       Table : Pearson’s correlation of course relatedness scores with human
Introduction   Knowledge Sources           Related Work                         First Approach                     Second Approach   Summary   References

Sensitivity Test
                                                                 Testing the Sensitivity of Parameters α, β, and     δ
                                                                   Pearson Correlation When   α Changes (β =0.5, δ =0.2)


                                   Pearson correlation



                                                           0.1     0.2    0.3     0.4     0.5        0.6     0.7         0.8   0.9

                                                                   Pearson Correlation When   β Changes (α =0.2, δ =0.2)

                                   Pearson correlation




                                                           0.1     0.2    0.3     0.4     0.5        0.6     0.7         0.8   0.9

                                                                   Pearson Correlation When   δ Changes (α =0.2, β =0.5)

                                   Pearson correlation




                                                           0.1     0.2    0.3     0.4     0.5        0.6     0.7         0.8   0.9
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References


               Highlight the problem of suggesting transfer course
               Proposes two semantic relatedness measures to tackle the
               A semantic relatedness measure based on traditional
               knowledge sources can be adapted.
               Wikipedia is a better knowledge source compared to
               traditional knowledge sources.
               A domain-specific semantic relatedness measure built on top
               of Wikipedia suits well for suggesting transfer course
               Provides a human judgment data set over 32 pairs of courses:
Introduction   Knowledge Sources   Related Work   First Approach   Second Approach   Summary   References

Published Literature

               Using Semantic Distance to Automatically Suggest Transfer Course
               Beibei Yang and Jesse M. Heines
               ACL-HLT 2011: Proceedings of the Sixth Workshop on Innovative
               Use of NLP for Building Educational Applications (BEA-6)
               Association for Computational Linguistics
               Domain-Specific Semantic Relatedness from Wikipedia: Can a
               Course be Transferred?
               Beibei Yang and Jesse M. Heines
               NAACL-HLT 2012 Student Research Workshop
Introduction    Knowledge Sources      Related Work      First Approach        Second Approach      Summary      References


Bibliography I
               Alexander Budanitsky and Graeme Hirst.
               Evaluating Wordnet-based measures of lexical semantic relatedness.
               Computational Linguistics, 32:13–47, 2006.

               Curt Burgess, Kay Livesay, and Kevin Lund.
               Explorations in context space: words, sentences, discourse.
               Discourse Processes, 25:211–257, 1998.

               Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka.
               Measuring semantic similarity between words using web search engines.
               In Proceedings of the 16th International Conference on World Wide Web, pages 757–766, New York, NY,
               USA, 2007. ACM.

               Rudi L. Cilibrasi and Paul M. B. Vitanyi.
               The google similarity distance.
               IEEE Transactions on Knowledge and Data Engineering, 19:370–383, 2007.

               Evgeniy Gabrilovich and Shaul Markovitch.
               Computing semantic relatedness using Wikipedia-based explicit semantic analysis.
               In Proceedings of the 20th International Joint Conference on AI, 2007.

               Evgeniy Gabrilovich and Shaul Markovitch.
               Wikipedia-based semantic interpretation for NLP.
               Journal of Artificial Intelligence Research, 34:443–498, 2009.

               Frederick Hayes-Roth, Donald A. Waterman, and Douglas B. Lenat.
               Building expert systems.
               Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1983.
Introduction    Knowledge Sources      Related Work      First Approach          Second Approach          Summary      References


Bibliography II

               Graeme Hirst and David St-Onge.
               WordNet: An electronic lexical database, chapter Lexical chains as representations of context for the
               detection and correction of malapropisms, pages 305–332.
               The MIT Press, Cambridge, MA, 1998.

               Hideki Kozima and Teiji Furugori.
               Similarity between words computed by spreading activation on an english dictionary.
               In Proceedings of the 6th conference on European chapter of the Association for Computational Linguistics,
               EACL ’93, pages 232–239, Stroudsburg, PA, USA, 1993. Association for Computational Linguistics.

               Yuhua Li, Zuhair A. Bandar, and David McLean.
               An approach for measuring semantic similarity between words using multiple information sources.
               IEEE Transactions on Knowledge and Data Engineering, pages 871–882, 2003.

               Claudia Leacock and Martin Chodorow.
               Combining local context and WordNet similarity for word sense identification, pages 265–283.
               The MIT Press, Cambridge, MA, 1998.

               Thomas K Landauer, Peter W. Foltz, and Darrell Laham.
               An introduction to latent semantic analysis.
               Discourse Processes, 25(2-3):259–284, 1998.

               Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett.
               Sentence similarity based on semantic nets and corpus statistics.
               IEEE Transactions on Knowledge and Data Engineering, 18(8):1138–1150, 2006.
Introduction    Knowledge Sources       Related Work      First Approach          Second Approach           Summary         References


Bibliography III
               Jane Morris and Graeme Hirst.
               Lexical cohesion computed by thesaural relations as an indicator of the structure of text.
               Computational Linguistics, 17(1):21–48, March 1991.

               Distributional measures of concept-distance: A task-oriented evaluation, Proceedings of the 2006
               Conference on Empirical Methods in Natural Language Processing, 2006.

               Saif Mohammad.
               Measuring Semantic Distance Using Distributional Profiles of Concepts.
               PhD thesis, University of Toronto, Toronto, Canada, 2008.

               Simone Paolo Ponzetto and Michael Strube.
               Knowledge derived from Wikipedia for computing semantic relatedness.
               Journal of Artificial Intelligence Research, 30:181–212, October 2007.

               Philip Resnik.
               Using information content to evaluate semantic similarity in a taxonomy.
               In Proceedings of the 14th international joint conference on Artificial intelligence, volume 1 of IJCAI’95,
               pages 448–453, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.

               Gerard Salton and Christopher Buckley.
               Term weighting approaches in automatic text retrieval.
               Information Processing and Management, 24:513–523, August 1988.

               Mehran Sahami and Timothy D. Heilman.
               A web-based kernel function for measuring the similarity of short text snippets.
               In Proceedings of the 15th International Conference on the World Wide Web, pages 377–386, New York,
               NY, USA, 2006. ACM.
Introduction    Knowledge Sources     Related Work     First Approach         Second Approach         Summary      References


Bibliography IV

               Peter D. Turney.
               Mining the web for synonyms: PMI-IR versus LSA on TOEFL.
               In Luc De Raedt and Peter A. Flach, editors, ECML, volume 2167 of Lecture Notes in Computer Science,
               pages 491–502. Springer, 2001.

               Zhibiao Wu and Martha Palmer.
               Verb semantics and lexical selection.
               In Proceedings 32nd Annual Meeting on Association for Computational Linguistics, pages 133–138, 1994.

               Dongqiang Yang and David M. W. Powers.
               Measuring semantic similarity in the taxonomy of wordnet.
               In Proceedings of the 28th Australasian Conference on Computer Science, volume 38, pages 315–322,
               Darlinghurst, Australia, 2005. Australian Computer Society, Inc.

More Related Content

What's hot

A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...Alexander Decker
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and Induction
Leon Derczynski
Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...
Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...
Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...
Martin Thorsen Ranang
DynaLearn: Problem-based learning supported by semantic techniques
DynaLearn: Problem-based learning supported by semantic techniquesDynaLearn: Problem-based learning supported by semantic techniques
DynaLearn: Problem-based learning supported by semantic techniques
Oscar Corcho
Characterising the Emergent Semantics in Twitter Lists
Characterising the Emergent Semantics in Twitter ListsCharacterising the Emergent Semantics in Twitter Lists
Characterising the Emergent Semantics in Twitter Lists
Oscar Corcho
Ajay Ohri
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text Summarization
Elaheh Barati
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...
Esra Açar
Annotating Rhetorical and Argumentative Structures in Mathematical Knowledge
Annotating Rhetorical and Argumentative Structures in Mathematical KnowledgeAnnotating Rhetorical and Argumentative Structures in Mathematical Knowledge
Annotating Rhetorical and Argumentative Structures in Mathematical Knowledge
Christoph Lange
Scheme of work est 2011
Scheme of work est 2011Scheme of work est 2011
Scheme of work est 2011SMS
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
Verbal cognition: vector space analysis by Chuluundorj.B /University of the H...
Verbal cognition: vector space analysis by Chuluundorj.B /University of the H...Verbal cognition: vector space analysis by Chuluundorj.B /University of the H...
Verbal cognition: vector space analysis by Chuluundorj.B /University of the H...
Buyankhishig Sunduijav
The Semantic Web #8 - Ontology
The Semantic Web #8 - OntologyThe Semantic Web #8 - Ontology
The Semantic Web #8 - Ontology
Myungjin Lee
The Semantic Web #7 - RDF Semantics
The Semantic Web #7 - RDF SemanticsThe Semantic Web #7 - RDF Semantics
The Semantic Web #7 - RDF Semantics
Myungjin Lee
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...Editor IJARCET

What's hot (16)

A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and Induction
Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...
Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...
Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...
DynaLearn: Problem-based learning supported by semantic techniques
DynaLearn: Problem-based learning supported by semantic techniquesDynaLearn: Problem-based learning supported by semantic techniques
DynaLearn: Problem-based learning supported by semantic techniques
Characterising the Emergent Semantics in Twitter Lists
Characterising the Emergent Semantics in Twitter ListsCharacterising the Emergent Semantics in Twitter Lists
Characterising the Emergent Semantics in Twitter Lists
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text Summarization
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emot...
Annotating Rhetorical and Argumentative Structures in Mathematical Knowledge
Annotating Rhetorical and Argumentative Structures in Mathematical KnowledgeAnnotating Rhetorical and Argumentative Structures in Mathematical Knowledge
Annotating Rhetorical and Argumentative Structures in Mathematical Knowledge
Scheme of work est 2011
Scheme of work est 2011Scheme of work est 2011
Scheme of work est 2011
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
Verbal cognition: vector space analysis by Chuluundorj.B /University of the H...
Verbal cognition: vector space analysis by Chuluundorj.B /University of the H...Verbal cognition: vector space analysis by Chuluundorj.B /University of the H...
Verbal cognition: vector space analysis by Chuluundorj.B /University of the H...
The Semantic Web #8 - Ontology
The Semantic Web #8 - OntologyThe Semantic Web #8 - Ontology
The Semantic Web #8 - Ontology
The Semantic Web #7 - RDF Semantics
The Semantic Web #7 - RDF SemanticsThe Semantic Web #7 - RDF Semantics
The Semantic Web #7 - RDF Semantics
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...

Viewers also liked

Prueba de hipotesis
Prueba de hipotesisPrueba de hipotesis
Prueba de hipotesis
Ce ma face speciala?
Ce ma face speciala?Ce ma face speciala?
Ce ma face speciala?
7º Ano Francielle Cristina e Joice Ferreira
7º Ano Francielle Cristina e Joice Ferreira7º Ano Francielle Cristina e Joice Ferreira
7º Ano Francielle Cristina e Joice FerreiraE. M. Célia Rabelo
En la vida no hay nada mas lindo
En la vida no hay nada mas lindoEn la vida no hay nada mas lindo
En la vida no hay nada mas lindoAlejandra Zavala
Pelo Siro
Consumer behaviour
Consumer behaviourConsumer behaviour
Consumer behaviourAli Zeeshan
Manfaat sayur dan buah
Manfaat sayur dan buahManfaat sayur dan buah
Manfaat sayur dan buah
Jamaluddin Dg Abu
Conexiones de plc arduino identificación de pines del arduino uno atmega...
Conexiones de plc  arduino   identificación de  pines del  arduino uno atmega...Conexiones de plc  arduino   identificación de  pines del  arduino uno atmega...
Conexiones de plc arduino identificación de pines del arduino uno atmega...
jonas ingeniero
Modul microsoft acces 2013 hhh
Modul microsoft acces 2013 hhhModul microsoft acces 2013 hhh
Modul microsoft acces 2013 hhh
Santosa Djauhari
Instrument Methods (Introduction)
Instrument Methods (Introduction)Instrument Methods (Introduction)
Instrument Methods (Introduction)
Center for Natural Product Technologies
gaya gravitasi
gaya gravitasigaya gravitasi
gaya gravitasi
Errores, sesgos y causalidad
Errores, sesgos y causalidadErrores, sesgos y causalidad
Errores, sesgos y causalidad
Joan Fernando Chipia Lobo

Viewers also liked (17)

Prueba de hipotesis
Prueba de hipotesisPrueba de hipotesis
Prueba de hipotesis
Ce ma face speciala?
Ce ma face speciala?Ce ma face speciala?
Ce ma face speciala?
L iteracynight2014
L iteracynight2014L iteracynight2014
L iteracynight2014
7º Ano Francielle Cristina e Joice Ferreira
7º Ano Francielle Cristina e Joice Ferreira7º Ano Francielle Cristina e Joice Ferreira
7º Ano Francielle Cristina e Joice Ferreira
En la vida no hay nada mas lindo
En la vida no hay nada mas lindoEn la vida no hay nada mas lindo
En la vida no hay nada mas lindo
Consumer behaviour
Consumer behaviourConsumer behaviour
Consumer behaviour
Manfaat sayur dan buah
Manfaat sayur dan buahManfaat sayur dan buah
Manfaat sayur dan buah
Conexiones de plc arduino identificación de pines del arduino uno atmega...
Conexiones de plc  arduino   identificación de  pines del  arduino uno atmega...Conexiones de plc  arduino   identificación de  pines del  arduino uno atmega...
Conexiones de plc arduino identificación de pines del arduino uno atmega...
Modul microsoft acces 2013 hhh
Modul microsoft acces 2013 hhhModul microsoft acces 2013 hhh
Modul microsoft acces 2013 hhh
Instrument Methods (Introduction)
Instrument Methods (Introduction)Instrument Methods (Introduction)
Instrument Methods (Introduction)
gaya gravitasi
gaya gravitasigaya gravitasi
gaya gravitasi
Errores, sesgos y causalidad
Errores, sesgos y causalidadErrores, sesgos y causalidad
Errores, sesgos y causalidad

Similar to Semantic Relatedness for Evaluation of Course Equivalencies

Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Leonardo Di Donato
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Luca Nannini
Meaningful Interaction Analysis
Meaningful Interaction AnalysisMeaningful Interaction Analysis
Meaningful Interaction Analysis
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
Andre Freitas
Towards Scientific Collaboration in a Semantic Wiki
Towards Scientific Collaboration in a Semantic WikiTowards Scientific Collaboration in a Semantic Wiki
Towards Scientific Collaboration in a Semantic Wiki
Christoph Lange
The Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for HindiThe Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for Hindi
IJERD ( International Journal of Engineering Research and Devel...
IJERD ( International Journal of Engineering Research and Devel...IJERD ( International Journal of Engineering Research and Devel...
IJERD ( International Journal of Engineering Research and Devel...IJERD Editor
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...
Anubhav Jain
IRJET - Automatic Text Summarization of News Articles
IRJET -  	  Automatic Text Summarization of News ArticlesIRJET -  	  Automatic Text Summarization of News Articles
IRJET - Automatic Text Summarization of News Articles
IRJET Journal
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Jonathon Hare
Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...
Anubhav Jain
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Bhaskar Mitra
Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...
Anubhav Jain
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
Enterprise Search Warsaw Meetup
introduction to nanotechnology-nanomaterials
introduction to nanotechnology-nanomaterialsintroduction to nanotechnology-nanomaterials
introduction to nanotechnology-nanomaterials

Similar to Semantic Relatedness for Evaluation of Course Equivalencies (20)

Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Meaningful Interaction Analysis
Meaningful Interaction AnalysisMeaningful Interaction Analysis
Meaningful Interaction Analysis
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
Towards Scientific Collaboration in a Semantic Wiki
Towards Scientific Collaboration in a Semantic WikiTowards Scientific Collaboration in a Semantic Wiki
Towards Scientific Collaboration in a Semantic Wiki
The Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for HindiThe Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for Hindi
IJERD ( International Journal of Engineering Research and Devel...
IJERD ( International Journal of Engineering Research and Devel...IJERD ( International Journal of Engineering Research and Devel...
IJERD ( International Journal of Engineering Research and Devel...
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...
IRJET - Automatic Text Summarization of News Articles
IRJET -  	  Automatic Text Summarization of News ArticlesIRJET -  	  Automatic Text Summarization of News Articles
IRJET - Automatic Text Summarization of News Articles
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
introduction to nanotechnology-nanomaterials
introduction to nanotechnology-nanomaterialsintroduction to nanotechnology-nanomaterials
introduction to nanotechnology-nanomaterials

More from Beibei Yang

Hubway Half a Million Trip Data
Hubway Half a Million Trip DataHubway Half a Million Trip Data
Hubway Half a Million Trip Data
Beibei Yang
Augmenting mobile 3 g using wifi
Augmenting mobile 3 g using wifiAugmenting mobile 3 g using wifi
Augmenting mobile 3 g using wifi
Beibei Yang
91.650 Paper Presentation
91.650 Paper Presentation91.650 Paper Presentation
91.650 Paper Presentation
Beibei Yang
Google Kernel Function
Google Kernel FunctionGoogle Kernel Function
Google Kernel Function
Beibei Yang
Class Project Showcase: DNS Spoofing
Class Project Showcase: DNS SpoofingClass Project Showcase: DNS Spoofing
Class Project Showcase: DNS Spoofing
Beibei Yang
Localization in HCI: Yahoo (US vs. China)
Localization in HCI: Yahoo (US vs. China)Localization in HCI: Yahoo (US vs. China)
Localization in HCI: Yahoo (US vs. China)
Beibei Yang

More from Beibei Yang (6)

Hubway Half a Million Trip Data
Hubway Half a Million Trip DataHubway Half a Million Trip Data
Hubway Half a Million Trip Data
Augmenting mobile 3 g using wifi
Augmenting mobile 3 g using wifiAugmenting mobile 3 g using wifi
Augmenting mobile 3 g using wifi
91.650 Paper Presentation
91.650 Paper Presentation91.650 Paper Presentation
91.650 Paper Presentation
Google Kernel Function
Google Kernel FunctionGoogle Kernel Function
Google Kernel Function
Class Project Showcase: DNS Spoofing
Class Project Showcase: DNS SpoofingClass Project Showcase: DNS Spoofing
Class Project Showcase: DNS Spoofing
Localization in HCI: Yahoo (US vs. China)
Localization in HCI: Yahoo (US vs. China)Localization in HCI: Yahoo (US vs. China)
Localization in HCI: Yahoo (US vs. China)

Recently uploaded

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance

Recently uploaded (20)

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Semantic Relatedness for Evaluation of Course Equivalencies

  • 1. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Semantic Relatedness for Evaluation of Course Equivalencies Doctoral Dissertation Defense Beibei Yang Department of Computer Science University of Massachusetts Lowell July 23, 2012
  • 2. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Outline 1 Introduction 2 Knowledge Sources 3 Related Work 4 First Approach 5 Second Approach 6 Summary
  • 3. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References NLP and Education Many NLP techniques have been adapted to the education field for: automated scoring and evaluation intelligent tutoring learner cognition However, few techniques address the identification of transfer course equivalencies.
  • 4. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Why is it important to suggest transfer course equivalencies? National Association for College Admission Counseling, 2010 “. . . less attention is focused on the transfer admission process, which affects approximately one-third of students beginning at either a four- or two-year institution during the course of their postsecondary careers.” National Center for Education Statistics, 2005 “For students who attained their bachelor’s degrees in 1999–2000, 59.7 percent attended more than one institution during their undergraduate careers and 32.1 percent transferred at least once.”
  • 5. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References UML’s course transfer dictionary
  • 6. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Course descriptions C1 : Analysis of Algorithms Discusses basic methods for designing and analyzing efficient algorithms emphasizing methods used in practice. Topics include sorting, searching, dynamic programming, greedy algorithms, advanced data structures, graph algorithms (shortest path, spanning trees, tree traversals), matrix operations, string matching, NP completeness. C2 : Computing III Object-oriented programming. Classes, methods, polymorphism, inheritance. Object-oriented design. C++. UNIX. Ethical and social issues. f : (C1 , C2 ) → n, n ∈ [0, 1] (1) C1 is a course from an external institution. C2 is a course offered at UML. Slide 34
  • 7. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Knowledge Acquisition Bottleneck Semantic relatedness measures that rely on a traditional knowledge base usually suffer the knowledge acquisition bottleneck. Knowledge acquisition is difficult for an expert system [HRWL83]: Representation mismatch: the difference between the way a human expert states knowledge and the way it is represented in the system. Knowledge inaccuracy: the difficulty for human experts to describe knowledge in terms that are precise, complete, and consistent enough for use in a computer program. Coverage problem: the difficulty of characterizing all of the relevant domain knowledge in a given representation system, even when the expert is able to correctly verbalize the knowledge. Maintenance trap: the time required to maintain a knowledge base.
  • 8. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Semantic Relatedness Three terms have been used interchangeably in related literature: semantic relatedness, semantic similarity, and semantic distance. Semantic Distance Semantic Relatedness Semantic Similarity Figure : The relations of semantic distance, semantic relatedness, and semantic similarity [BH06].
  • 9. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Semantic Similarity versus Semantic Relatedness Semantic Similarity animal cat close human cat distant Semantic Relatedness cat paw close cat hand distant
  • 10. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Popular Knowledge Sources 1 Lexicon-based Resources Dictionaries Thesauri WordNet Cyc 2 Corpus-based Resources Project Gutenberg British National Corpus Penn Treebank 3 Hybrid Resources Wikipedia Wikitionary
  • 11. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Related Work on Semantic Relatedness 1 Lexicon-based Dictionary [KF93] Thesaurus [MH91] WordNet [WP94, LC98, HSO98, YP05] 2 Corpus-based Query Expansion [SH06, BMI07, CV07] LSA [LFL98] HAL [BLL98] PMI-IR [Tur01] ESA (Wikipedia) [GM07, GM09] 3 Hybrid Information Content [Res95] Distributional profiling [Moh06, Moh08] Li et al. [LBM03, LMB+ 06] Ponzetto and Strube (Wikipedia) [PS07]
  • 12. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References A Fragment of the WordNet Taxonomy entity.n.01 physical entity.n.01 ❳ ❢❢❢❢❢ ❳❳❳❳❳❳❳❳❳ ❢❢❢❢❢ ❳❳❳❳❳ ❢❢❢❢❢ object.n.01 ❳ matter.n.03 ❳❳ ❢❢❢❢❢ ❳❳❳❳❳❳❳❳❳ ❳❳❳❳❳ ❢❢❢❢❢ ❳❳❳❳❳ ❳❳❳❳❳ ❢❢❢❢❢ ❳❳❳ part.n.02 whole.n.02 solid.n.01 component.n.03 artifact.n.01 crystal.n.01 crystal.n.02 decoration.n.01 gem.n.02 piezoelectric crystal.n.01 adornment.n.01 transparent gem.n.01 jewelry.n.01 ❳ diamond.n.02 ❢❢❢❢❢ ❳❳❳❳❳❳❳❳❳ ❢❢❢❢❢ ❳❳❳❳❳ ❢❢❢❢❢ bracelet.n.02 necklace.n.01
  • 13. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References The First Approach 1 Semantic relatedness between two concepts: based on their path length and the depth of their common ancestor in the WordNet taxonomy. 2 Semantic relatedness between two words: based on the previous step, and includes POS and WSD. 3 Semantic relatedness between two sentences: constructs two semantic vectors, and takes into account the information content. 4 Word order similarity (optional): “a dog bites a man” & “a man bites a dog” 5 Semantic relatedness between paragraphs 6 Semantic relatedness between courses
  • 14. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Concept Relatedness Path function: f1 (p) = e−αp (α ∈ [0, 1]) (2) Depth function: eβh − e−βh f2 (h) = (β ∈ [0, 1]) (3) eβh + e−βh Semantic relatedness between concepts c1 and c2 : fword (c1 , c2 ) = f1 (p) · f2 (h) (4)
  • 15. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Semantic Relatedness Between Words Algorithm 1 Semantic Relatedness Between Words 1: If two words w1 and w2 have different POS, consider them se- mantically distant. Return 0. 2: If w1 and w2 have the same POS and look the same but do not exist in WordNet, consider them semantically close. Return 1. 3: Using either maximum scores or the first sense heuristic to per- form WSD, measure the semantic relatedness between w1 and w2 using Equation 4 . 4: Using the same WSD strategy as the previous step, measure the semantic relatedness between the stemmed w1 and the stemmed w2 using Equation 4 . 5: Return the larger of the two results in steps (3) and (4), i.e., the score of the pair that is semantically closer.
  • 16. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Construct a List of Joint Words To measure the semantic relatedness between sentences S1 and S2 , first join them into a unique word set S, with a length of n: S = S1 ∪ S2 = {w1 , w2 , . . . wn }. (5) S1 : introduction to computer programming S2 : introduction to computing environments S: introduction to computer programming computing environments
  • 17. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Construct a Lexical Semantic Vector Algorithm 2 Lexical Semantic Vector s1 for S1 ˆ 1: for all words wi ∈ S do 2: if wi ∈ S1 , set sˆ = 1 where sˆ ∈ s1 . 1i 1i ˆ 3: if wi ∈ S1 , the semantic relatedness between wi and each / word w1j ∈ S1 is calculated using algorithm 1 . Set sˆ to the 1i highest score if the score exceeds a preset threshold δ (δ ∈ [0, 1]), otherwise sˆ = 0. 1i 4: Let γ ∈ [1, n] be the maximum number of times a word w1j ∈ S1 is chosen as semantically the closest word of wi . Let the semantic relatedness of wi and w1j be d, and f1j be the number of times that w1j is chosen. If f1j > γ, set sˆ = d/f1j to give a penalty to w1j . This step is called 1i ticketing. 5: end for
  • 18. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References First-level Sentence Relatedness TF-IDF: N T F IDF (wi ) = tfi · idfi = tfi · log (6) dfi Semantic vector SV1 for sentence S1 : SV1i = sˆ ·(T F IDF (wi )+ )·(T F IDF (w1j )+ ), 1i (i ∈ [1, n], j ∈ [1, t]) (7)
  • 19. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References First-level Sentence Relatedness (1) SV1 · SV2 fsent (S1 , S2 ) = (8) ||SV1 || · ||SV2 ||
  • 20. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Second-level Sentence Relatedness Word order similarity: ||Q1 − Q2 || forder (S1 , S2 ) = 1 − (9) ||Q1 + Q2 || Q1 , Q2 : word order vectors of S1 and S2 . Second-level Sentence Relatedness: (2) (1) fsent (S1 , S2 ) = τ ·fsent (S1 , S2 )+(1−τ )·forder (S1 , S2 ), τ ∈ [0, 1] (10)
  • 21. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Semantic Relatedness Between Paragraphs n m i=1 (maxj=1 fsent (s1i , s2j )) · Ni fpara (P1 , P2 ) = n (11) i=1 Ni Algorithm 3 Semantic Relatedness for Paragraphs 1: If deletion is enabled, given two course descriptions, select the one with fewer sentences as P1 , and the other as P2 . If deletion is disabled, select the first course description as P1 , and the other as P2 . 2: for each sentence s1i ∈ P1 do 3: Calculate the semantic relatedness between sentences using equation 10 for s 1i and each of the sentences in P2 . 4: Find the sentence pair s1i , s2j (s2j ∈ P2 ) that scores the highest. Save the highest score and the total number of words of s1i and s2j . If deletion is enabled, remove sentence s2j from P2 . 5: end for 6: Collect the highest score and the number of words from each run. Use their weighted mean from equation 11 as the semantic relatedness between P1 and P2 .
  • 22. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Semantic Relatedness Between Courses fcourse (C1 , C2 ) = θ·fsent (T1 , T2 )+(1−θ)·fpara (P1 , P2 ), θ ∈ [0, 1] (12)
  • 23. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Data sets Data Sets MCC Courses UML Courses Total Small 25 24 49 Medium 55 50 105 Large 108 89 197 Table : Number of courses in the data sets
  • 24. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Experimental Results Compared against the method by Li et al. [LMB+ 06] and TF-IDF [SB88]: Accuracy Comparison Average ranks of the real equivalent courses 100 Enable word order Enable word order Disable word order 20 Disable word order 90 Best case TFIDF TFIDF Li Li 80 15 70 Average rank Accuracy 60 10 50 40 5 30 Best case 20 0 49 105 197 49 105 197 Number of documents Number of documents
  • 25. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Experimental Results Performance of two word sense disambiguation algorithms: Accuracy Comparison of WSD 100 90 Best case 80 70 Accuracy 60 50 40 30 FIRST SENSE MAX 20 49 105 197 Number of documents
  • 26. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References What’s Wrong with WordNet? 91.304 Foundations of Computer Science A survey of the mathematical foundations of Computer Science. Finite automata and regular languages. Stack Acceptors and Context-Free Languages. Turing Machines, recursive and recursively enumerable sets. Decidability. Complexity. This course involves no computer programming. 64 unfiltered words fetched from WordNet acceptor, adjust, arrange, automaton, basis, batch, bent, calculator, car, class, complexity, computer, countable, course, determine, dress, even, finite, fix, foundation, foundation garment, fructify, hardening, imply, initiation, involve, jell, language, linguistic process, lyric, machine, mathematical, naturally, necessitate, numerical, path, place, plant, push-down list, push-down storage, put, recursive, regular, review, rig, run, science, set, set up, sic, sketch, skill, smokestack, specify, speech, stack, stage set, surveil, survey, terminology, turing, typeset, unconstipated, view.
  • 27. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References What’s Wrong with WordNet? 91.304 Foundations of Computer Science A survey of the mathematical foundations of Computer Science. Finite automata and regular languages. Stack Acceptors and Context-Free Languages. Turing Machines, recursive and recursively enumerable sets. Decidability. Complexity. This course involves no computer programming. 18 articles fetched from Wikipedia using the second approach Alan Turing, Algorithm, Automata theory, Complexity, Computer, Computer science, Context-free language, Enumeration, Finite set, Finite-state machine, Kolmogorov complexity, Language, Machine, Mathematics, Recursive, Recursive language, Recursively enumerable set, Set theory. Slide 33
  • 28. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Growth of Wikipedia and WordNet over the years Growth of English Wikipedia and WordNet 4000000 Articles in Wikipedia 3500000 Synsets in WordNet 3000000 Article/Synset count 2500000 2000000 1500000 1000000 500000 1992 1996 2000 2004 2008 2012 Year
  • 29. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References WordNet versus Wikipedia Fragments of WordNet and Wikipedia Taxonomies WordNet [Root: synset(‘‘technology’’), #depth: 2] # nodes: 25 Wikipedia [Centroid: ‘‘Category:Technology’’, #steps: 2] # nodes: 3583
  • 30. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Extract a Lexicographical Hierarchy from Wikipedia 1 Let’s assume the knowledge domain is specified, e.g., “Category:Computer science.” 2 Choose its parent as the root, i.e., “Category:Applied sciences.” 3 Use a depth-limited search to recursively traverse each subcategory (including subpages) to build a lexicographical hierarchy with depth D.
  • 31. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Growth of the Hierarchy from Wikipedia Depth: 3 Depth: 1 Depth: 2 Total Nodes: 64,407 Total Nodes: 72 Total Nodes: 4,249 Growth of the lexicographical hierarchy constructed from Wikipedia, illustrated in circular trees. A lighter color of the nodes and edges indicates that they are at a deeper depth in the hierarchy.
  • 32. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Lexicographical Hierarchy constructed from Wikipedia Depth (D) Number of concepts at this level 1 71 2 4,177 3 60,158 4 177,955 5 494,039 6 1,848,052 Table : Number of concepts for each depth in the “Category:Applied sciences” hierarchy. The hierarchy only include 1,534,267 distinct articles, out of 5,329,186 articles in Wikipedia. ⇒ Over 71% Wikipedia articles are eliminated.
  • 33. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Generate Course Description Features Algorithm 4 Feature Generation (F ) for Course C 1: Tc ← ∅ (clear terms), Ta ← ∅ (ambiguous terms). 2: Generate all possible n-grams (n ∈ [1, 3]) G from C. 3: Fetch the pages whose titles match any of g ∈ G from Wikipedia redirection data. For each page pid of term t, Tc ← Tc ∪ {t : pid}. 4: Fetch the pages whose titles match any of g ∈ G from Wikipedia page title data. If a disambiguation page, include all the terms this page refers to. If a page pid corresponds to a term t that is not ambiguous, Tc ← Tc ∪{t : pid}, else Ta ← Ta ∪ {t : pid}. 5: For each term ta ∈ Ta , find the disambiguation that is on average most related using Equation 4 to the set of clear terms. If a page pid of ta is on average the most related to the terms in Tc , and the relatedness score is above a threshold δ (δ ∈ [0, 1]), set Tc ← Tc ∪ {ta : pid}. If ta and a clear term are different senses of the same term, keep the one that is more related to all the other clear terms. 6: Return clear terms as features. Slide 27
  • 34. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Example of Course Features C1 : {1134:“Analysis”, 775:“Algorithm”} {41985:“Shortest path problem”, 597584:“Tree traversal”, 455770:“Spanning tree”, 18955875:“Tree”, 1134:“Analysis”, 18568:“List of algorithms”, 56054:“Completeness”, 775:“Algorithm”, 144656:“Sorting”, 8519:“Data structure”, 93545:“Structure”, 8560:“Design”, 18985040:“Data”} C2 : {5213:“Computing”} {21347364:“Unix”, 289862:“Social”, 9258:“Ethics”, 6111038:“Object-oriented design”, 5311:“Computer programming”, 72038:“C++”, 27471338:“Object-oriented programming”, 8560:“Design”} Slide 6
  • 35. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Lexical Semantic Vector An algorithm similar to Algorithm 2 is used to determine each value of an entry of the lexical semantic vector sˆ for features 1i F1 . A semantic vector is defined as: SV1i = sˆ · I(ti ) · I(tj ) 1i (13)
  • 36. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Information Content Information content I(t) of a term t: I(t) = γ · Ic (t) + (1 − γ) · Il (t). (14) Category information content Ic (t): log(siblings(t) + 1) Ic (t) = 1 − , (15) log(N ) Linkage information content Il (t): inlinks(pid) outlinks(pid) Il (t) = 1 − · , (16) M AXIN M AXOU T
  • 37. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Determine Course Relatedness SV1 · SV2 f (C1 , C2 ) = . (17) ||SV1 || · ||SV2 || f (T1 , T2 ) · (||FT 1 || + ||FT 2 ||) + f (C1 , C2 ) · (||FC1 || + ||FC2 ||) f (course1 , course2 ) = +Ω, ||FT 1 || + ||FT 2 || + ||FC1 || + ||FC2 || (18)
  • 38. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Experimental Results Randomly select 25 CS courses from 19 universities that can be transferred to UML according to the transfer dictionary. Each transfer course is compared to all 44 CS courses offered at UML. The result is considered correct if the real equivalent course at UML is among the top 3 in the list of highest scores. Algorithm Accuracy Proposed approach 72% Li et al. [LMB+ 06] 52% TF-IDF 32% Table : Accuracy of the second approach against those of Li et al., and TFIDF
  • 39. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Experimental Results Algorithm Pearson’s correlation p-value TF-IDF 0.730 2 · 10−6 Li et al. [LMB+ 06] 0.570 0.0006 Proposed approach (Features) 0.845 1.13 · 10−9 Proposed approach (Features + IC) 0.851 6.65 · 10−10 Table : Pearson’s correlation of course relatedness scores with human judgments.
  • 40. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Sensitivity Test Testing the Sensitivity of Parameters α, β, and δ 1.0 Pearson Correlation When α Changes (β =0.5, δ =0.2) 0.8 Pearson correlation 0.6 0.4 0.2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 α 1.0 Pearson Correlation When β Changes (α =0.2, δ =0.2) 0.8 Pearson correlation 0.6 0.4 0.2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 β 1.0 Pearson Correlation When δ Changes (α =0.2, β =0.5) 0.8 Pearson correlation 0.6 0.4 0.2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 δ
  • 41. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Summary Highlight the problem of suggesting transfer course equivalencies. Proposes two semantic relatedness measures to tackle the problem. A semantic relatedness measure based on traditional knowledge sources can be adapted. Wikipedia is a better knowledge source compared to traditional knowledge sources. A domain-specific semantic relatedness measure built on top of Wikipedia suits well for suggesting transfer course equivalencies. Provides a human judgment data set over 32 pairs of courses:
  • 42. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References Published Literature Using Semantic Distance to Automatically Suggest Transfer Course Equivalencies Beibei Yang and Jesse M. Heines ACL-HLT 2011: Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications (BEA-6) Association for Computational Linguistics Domain-Specific Semantic Relatedness from Wikipedia: Can a Course be Transferred? Beibei Yang and Jesse M. Heines NAACL-HLT 2012 Student Research Workshop
  • 43. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References References Bibliography I Alexander Budanitsky and Graeme Hirst. Evaluating Wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32:13–47, 2006. Curt Burgess, Kay Livesay, and Kevin Lund. Explorations in context space: words, sentences, discourse. Discourse Processes, 25:211–257, 1998. Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. Measuring semantic similarity between words using web search engines. In Proceedings of the 16th International Conference on World Wide Web, pages 757–766, New York, NY, USA, 2007. ACM. Rudi L. Cilibrasi and Paul M. B. Vitanyi. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19:370–383, 2007. Evgeniy Gabrilovich and Shaul Markovitch. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on AI, 2007. Evgeniy Gabrilovich and Shaul Markovitch. Wikipedia-based semantic interpretation for NLP. Journal of Artificial Intelligence Research, 34:443–498, 2009. Frederick Hayes-Roth, Donald A. Waterman, and Douglas B. Lenat. Building expert systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1983.
  • 44. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References References Bibliography II Graeme Hirst and David St-Onge. WordNet: An electronic lexical database, chapter Lexical chains as representations of context for the detection and correction of malapropisms, pages 305–332. The MIT Press, Cambridge, MA, 1998. Hideki Kozima and Teiji Furugori. Similarity between words computed by spreading activation on an english dictionary. In Proceedings of the 6th conference on European chapter of the Association for Computational Linguistics, EACL ’93, pages 232–239, Stroudsburg, PA, USA, 1993. Association for Computational Linguistics. Yuhua Li, Zuhair A. Bandar, and David McLean. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, pages 871–882, 2003. Claudia Leacock and Martin Chodorow. Combining local context and WordNet similarity for word sense identification, pages 265–283. The MIT Press, Cambridge, MA, 1998. Thomas K Landauer, Peter W. Foltz, and Darrell Laham. An introduction to latent semantic analysis. Discourse Processes, 25(2-3):259–284, 1998. Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett. Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering, 18(8):1138–1150, 2006.
  • 45. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References References Bibliography III Jane Morris and Graeme Hirst. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1):21–48, March 1991. Distributional measures of concept-distance: A task-oriented evaluation, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, 2006. Saif Mohammad. Measuring Semantic Distance Using Distributional Profiles of Concepts. PhD thesis, University of Toronto, Toronto, Canada, 2008. Simone Paolo Ponzetto and Michael Strube. Knowledge derived from Wikipedia for computing semantic relatedness. Journal of Artificial Intelligence Research, 30:181–212, October 2007. Philip Resnik. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th international joint conference on Artificial intelligence, volume 1 of IJCAI’95, pages 448–453, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc. Gerard Salton and Christopher Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management, 24:513–523, August 1988. Mehran Sahami and Timothy D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th International Conference on the World Wide Web, pages 377–386, New York, NY, USA, 2006. ACM.
  • 46. Introduction Knowledge Sources Related Work First Approach Second Approach Summary References References Bibliography IV Peter D. Turney. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Luc De Raedt and Peter A. Flach, editors, ECML, volume 2167 of Lecture Notes in Computer Science, pages 491–502. Springer, 2001. Zhibiao Wu and Martha Palmer. Verb semantics and lexical selection. In Proceedings 32nd Annual Meeting on Association for Computational Linguistics, pages 133–138, 1994. Dongqiang Yang and David M. W. Powers. Measuring semantic similarity in the taxonomy of wordnet. In Proceedings of the 28th Australasian Conference on Computer Science, volume 38, pages 315–322, Darlinghurst, Australia, 2005. Australian Computer Society, Inc.