Language Technology Enhanced Learning

Language Technology Enhanced Learning Fridolin Wild The Open University, UK Gaston Burek University of Tübingen Adriana Berlanga Open University, NL

Workshop Outline 1 | Deep Introduction Latent-Semantic Analysis (LSA) 2 | Quick Introduction Working with R 3 | Experiment Simple Content-Based Feedback 4 | Experiment Topic Proxy #

Latent Semantic Analysis Assumption: language utterances do have a semantic structure However, this structure is obscured by word usage (noise, synonymy, polysemy, …) Proposed LSA Solution: map doc-term matrix using conceptual indices derived statistically (truncated SVD ) and make similarity comparisons using e.g. angles

Input (e.g., documents) { M } = Deerwester, Dumais, Furnas, Landauer, and Harshman (1990): Indexing by Latent Semantic Analysis, In: Journal of the American Society for Information Science, 41(6):391-407 Only the red terms appear in more than one document, so strip the rest. term = feature vocabulary = ordered set of features TEXTMATRIX

Singular Value Decomposition =

Truncated SVD … we will get a different matrix (different values, but still of the same format as M). latent-semantic space

Reconstructed, Reduced Matrix m4: Graph minors : A survey

Similarity in a Latent-Semantic Space (Landauer, 2007) Query Target 1 Target 2 Angle 2 Angle 1 Y dimension X dimension

doc2doc - similarities Unreduced = pure vector space model - Based on M = TSD’ - Pearson Correlation over document vectors reduced - based on M 2 = TS 2 D’ - Pearson Correlation over document vectors

Configurations 4 x 12 x 7 x 2 x 3 = 2016 Combinations

Updating: Folding-In SVD factor stability Different texts – different factors Challenge: avoid unwanted factor changes (e.g., bad essays) Solution: folding-in instead of recalculating SVD is computationally expensive 14 seconds (300 docs textbase) 10 minutes (3500 docs textbase) … and rising!

The Statistical Language and Environment R R

Help > ?'+' > ?kmeans > help.search("correlation") http://www.r-project.org => site search => documentation Mailinglist r-help Task View NLP: http://cran.r-project.org/ -> Task Views -> NLP

Installation & Configuration install.packages("lsa", repos="http://cran.r-project.org") install.packages("tm", repos="http://cran.r-project.org") install.packages("network", repos="http://cran.r-project.org") library(lsa) setwd("d:/denkhalde/workshop") dir() ls() quit()

The lsa Package Available via CRAN, e.g.: http://cran.at.r-project.org/src/contrib/Descriptions/lsa.html Higher-level Abstraction to Ease Use Five core methods: textmatrix() / query() lsa() fold_in() as.textmatrix() Supporting methods for term weighting, dimensionality calculation, correlation measurement, triple binding

Core Processing Workflow tm = textmatrix(‘dir/‘) tm = lw_logtf(tm) * gw_idf(tm) space = lsa(tm, dims=dimcalc_share()) tm3 = fold_in(tm, space) as.textmatrix(tm)

A Simple Evaluation of Students Writings Feedback

Evaluating Student Writings External Validation? Compare to Human Judgements! (Landauer, 2007)

How to do it... library( "lsa“ ) # load package # load training texts trm = textmatrix( "trainingtexts/“ ) trm = lw_bintf( trm ) * gw_idf( trm ) # weighting space = lsa( trm ) # create an LSA space # fold-in essays to be tested (including gold standard text) tem = textmatrix( "testessays/", vocabulary=rownames(trm) ) tem = lw_bintf( tem ) * gw_idf( trm ) # weighting tem_red = fold_in( tem, space ) # score an essay by comparing with # gold standard text (very simple method!) cor( tem_red[,"goldstandard.txt"], tem_red[,"E1.txt"] ) => 0.7

Evaluating Effectiveness Compare Machine Scores with Human Scores Human-to-Human Correlation Usually around .6 Increased by familiarity between assessors, tighter assessment schemes, … Scores vary even stronger with decreasing subject familiarity (.8 at high familiarity, worst test -.07) Test Collection: 43 German Essays, scored from 0 to 5 points (ratio scaled), average length: 56.4 words Training Collection: 3 ‘golden essays’, plus 302 documents from a marketing glossary, average length: 56.1 words

(Positive) Evaluation Results LSA machine scores: Spearman's rank correlation rho data: humanscores[names(machinescores), ] and machinescores S = 914.5772, p-value = 0.0001049 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.687324 Pure vector space model: Spearman's rank correlation rho data: humanscores[names(machinescores), ] and machinescores S = 1616.007, p-value = 0.02188 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.4475188

Concept-Focused Evaluation (using http://eczemablog.blogspot.com/feeds/posts/default?alt=rss)

Visualising Lexical Semantics Topic Proxy

Network Visualisation Term-2-Term distance matrix = = Graph t 1 t 2 t 3 t 4 t 1 1 t 2 -0.2 1 t 3 0.5 0.7 1 t 4 0.05 -0.5 0.3 1

Classical Landauer Example tl = landauerSpace$tk %*% diag(landauerSpace$sk) dl = landauerSpace$dk %*% diag(landauerSpace$sk) dtl = rbind(tl,dl) s = cosine(t(dtl)) s[which(s<0.8)] = 0 plot( network(s), displaylabels=T, vertex.col = c(rep(2,12), rep(3,9)) )

Code Sample d2000 = cosine(t(dtm2000)) dianac2000 = diana(d2000, diss=T) clustersc2000 = cutree(as.hclust(dianac2000), h=0.2) plot(dianac2000, which.plot=2, cex=.1) # dendrogramme winc = clustersc2000[which(clustersc2000==1)] # filter for cluster 1 wincn = names(winc) d = d2000[wincn,wincn] d[which(d<0)] == 0 btw = betweenness(d, cmode="undirected") # for nodes size calc btwmax = colnames(d)[which(btw==max(btw))] btwcex = (btw/max(btw))+1 plot(network(d), displayisolates=F, displaylabels=T, boxed.labels=F, edge.col="gray", main=paste("cluster",i), usearrows=F, vertex.border="darkgray", label.col="darkgray", vertex.cex=btwcex*3, vertex.col=8-(colnames(d) %in% btwmax))

Permutation test NON PARAMETRIC: does not assume that the data have a particular probability distribution . Suppose the following ranking of elements of two categories X and Y Actual data to be evaluated, (x_1,x_2,y_1) = (1,9,2). Let, T(x_1,x_2,y_1)=abs(mean X- mean Y) = 2

Permutation Usually, it is not practical to evaluate all N! permutatioons. We can approximate the p-value by sampling randomly from the set of permutations.

The permutations are: permutation value of T -------------------------------------------- (1,9,3) 2 (actual data) (9,1,3) 2 (1,3,9) 7 (3,1,9) 7 (3,9,1) 5 (9,3,1) 5

Some results Students discussions on safe prescribing: Classified according expected learning outcomes related subtopics topics: A=7, B=12, C=53, D=4, E=40, F=7 Graded: poor, fair, good, excelent Methodology used: LSA Bag of words/Maximal Repeated Phrases Permutation test

Challenging Questions Discussion

Questions Dangers of using Language Technology? Ontologies = Neat? NLP = Nasty? Other possible application areas? Corpus Collection? What is good effectiveness? When can we say that an algorithm works well? Other aspects not evaluated…

Language Technology Enhanced Learning

More Related Content

What's hot

Similar to Language Technology Enhanced Learning

More from telss09

Recently uploaded

Language Technology Enhanced Learning