Distance Metric Learning

Distance Metric Learning
2014-03-06
전상혁
2014-03-05 1

Introduction
•DistanceMetriclearningistolearnadistancemetricfortheinputspaceofdata
•Dataisfromagivencollectionofpairofsimilar/dissimilarpointsthatpreservesthedistancerelationamongthetrainingdata
•Learnedmetriccansignificantlyimprovetheperformanceofalgorithmssuchasclassification,clusteringandmanyothertasks
2014-03-05 2

Distance Metric
•Adistancemetricordistancefunctionisafunctionthatdefinesadistancebetweenelementsofaset
•Example1:Euclideanmetric
•Example2:Discretemetric
•푖푓푥=푦푡ℎ푒푛푑푥,푦=0,푂푡ℎ푒푟푤푖푠푒,푑푥,푦=1
•AmetriconasetXisafunction푑:푋×푋→퐑(whereRisrealnumber)
•푑푥,푦≥0
•푑푥,푦=0푖푓푎푛푑표푛푙푦푖푓푥=푦
•푑푥,푦=푑(푦,푥)
•푑푥,푧≤dx,y+d(y,z)
2014-03-05 Aurelien Bellet, Tutorial on Metric Learning, 2013 3

Introduction
2014-03-05 Bellet, A., Habrard, A., and Sebban, M. A Survey on Metric Learning for Feature Vectors and Structured Data, 2013 4
Bellet, A., Habrard, A., and Sebban, M. A Survey on Metric Learning for Feature Vectors and Structured Data, 2013

Categories of distance metric learning
•Superviseddistancemetriclearning
•Constraintsorlabelsgiventothealgorithm
•Equivalenceconstraintswherepairsofdatapointsthatbelongtothesameclasses(semantically-similar)
•Inequivalenceconstraintswherepairsofdatapointsthatbelongtothedifferentclasses(semantically-dissimilar)
•localadaptivesuperviseddistancemetriclearning,NCA,RCA
•Unsuperviseddistancemetriclearning
•Dimensionalityreductiontechniques
•Linearmethods(PCA,MDS)andnon-linearmethods(ISOMAP,LLE,LE)
•Kernelmethodstowardslearningdistancemetrics
•Kernelalignment,extensionworkoflearningidealizedkernel
2014-03-05 Liu Yang, An Overview of Distance Metric Learning, 2007 5
Liu Yang, Distance Metric Learning: A Comprehensive Survey, 2005
Liu Yang, An Overview of Distance Metric Learning, 2007

•Maximummarginbaseddistancemetriclearning
•Formulatedistancemetriclearningasaconstrainedconvexprogrammingproblem,andattempttolearncompletedistancemetricfromtrainingdata
•Expensivecomputation,overfittingproblemwhenlimitedtrainingsamplesandlargefeaturedimension
•Largemarginnearestneighborclassification(LMNN),SDPmethodstosolvethekernelizedmarginmaximization
Liu Yang, Distance Metric Learning: A Comprehensive Survey, 2005

Large Margin Nearest Neighbor Classification (LMNN)
KilianWeinberger,etal.Distancemetriclearningforlargemarginnearestneighborclassification.NIPS,2006.
2014-03-05 Kilian Weinberger, et al. Distance metric learning for large margin nearest neighbor classification. NIPS, 2006. 8

Motivation –better KNN
•KNN:simplenon-parametricclassificationmethod(Alsocanbeusedforclustering,regression,anomalydetection,andetc.)
•푘isuser-definedconstant,andanunlabeledvectorisclassifiedbyassigningthelabelwhichismostfrequentamongthe푘trainingsamplesnearesttothatunlabeledvector
•Drawback:whentheclassdistributionisskewed,basic“majorityvoting” classificationmaynotworkswell.

Introduction
•Largemarginnearestneighbor(LMNN)method
•NeighborsareselectedbyusingthelearnedMahanalobisdistancemetric
•LearnaMahanalobisdistancemetricforkNNclassificationbysemidefiniteprogramming
•Find 퐿:푅푑→푅푑,which will be used to compute squared distance as 퐷푥푖,푥푗=퐿푥푖−푥푗 2by minimizing some cost function 휖(퐿)

MahanalobisDistance Metric
•Relativemeasureofadatapoint’sdistancefromacommonpoint
•ItdiffersfromEuclideandistanceinthatittakesintoaccountthecorrelationsofthedatasetandisscale-invariant
•푑 푥, 푦= 푥− 푦⊤훀( 푥− 푦)where훀ispositivesemidefintematrix
•Example:thedistancebetweentworandomvariable 푥, 푦ofthesamedistributionwithcovariancematrix푆isdefinedas
•푑 푥, 푦= 푥− 푦⊤푆−1( 푥− 푦)
•EuclideandistanceisspecialcasewhenSisidentitymatrix
•Recall:distance퐷푥푖,푥푗=퐿푥푖−푥푗 2
•Wecanrewriteeq.as퐷푥푖,푥푗=푥푖−푥푗퐌⊤푥푖−푥푗wherethematrix퐌= 퐋⊤퐋,parameterizestheMahalanobisdistancemetricinducedbythelineartransform퐋

SemidefiniteProgramming
•Akindofconvexoptimizationproblem
•푥isavectorofnvariables,푐isaconstantndimensionalvector
•Linearprogramming(LP)
•푚푖푛푖푚푖푧푒푐∙푥 푠.푡.푎푖∙푥=푏푖,푖=1,…,푚 푥∈푅+푛
•Xispositivesemidefinitematrixif푣⊤푋푣≥0푓표푟푎푛푦푣∈푅푛(푋≽0)
•If퐶(푋)isalinearfunctionofX,then퐶(푋)canbewrittenas퐶∙푋
•Semidefiniteprogram(SDP)
•푚푖푛푖푚푖푧푒퐶∙푋 푠.푡.퐴푖∙푋=푏푖,푖=1,…,푚 푋≽0

Cost Function
•휖퐿= 푖푗휂푖푗퐋푥푖−푥푗 2+푐 푖푗푙휂푖푗1−푦푖푙1+퐋푥푖−푥푗 2−퐋푥푖−푥푙 2+
•Let푥푖,yii=1ndenoteatrainingsetofnlabeledexampleswithinputs푥푖∈푅푑anddiscrete(butnotnecessarilybinary)classlabels푦푖
•푦푖푗∈{0,1}isbinarymatrixtoindicatewhetherornotthelabels푦푖and푦푗match
•휂푖푗∈{0,1}toindicatewhetherinput푥푗isatargetneighborofinput푥푖,thisvalueisfixedanddoesnotchangeduringlearning
•Targetneighbor:eachinstance푥푖hasexactlykdifferenttargetneighborswithindataset,whichallsharethesameclasslabel푦푖
•Targetneighborsarethedatapointsthatshouldbecomenearestneighborsunderthelearnedmetric
•C.f.Impostors:sameastargetneighborexceptithasdifferentclasslabel
•Term푧+=max(푧,0)
•푐>0somepositiveconstant(typicallysetbycrossvalidation)

Cost Function
•휖퐿= 푖푗휂푖푗퐋푥푖−푥푗 2+푐 푖푗푙휂푖푗1−푦푖푙1+퐋푥푖−푥푗 2−퐋푥푖−푥푙 2+
•Thefirstoptimizationgoalisachievedbyminimizingtheaveragedistancebetweeninstanceandtheirtargetneighbors
•
•Thesecondgoalisachievedbyconstrainingimpostors푥푙tobeoneunitfurtherawaythantargetneighbors푥푗
•
•푁푖denotethesetoftargetneighborsforadatapoint푥푖

2014-03-05 15
Nakul Verma, A tutorial on Metric Learning with some recent advances

Convex Optimization
•Recall:distance퐷푥푖,푥푗=퐿푥푖−푥푗 2
•Recall:Wecanrewriteeq.as퐷푥푖,푥푗=푥푖−푥푗퐌⊤푥푖−푥푗wherethematrix퐌=퐋⊤퐋,parameterizestheMahalanobisdistancemetricinducedbythelineartransform퐋
•Thehingelosscanbe“mimicked”byintroducingslackvariable휁푖푗forallpairsofdifferentlylabeledinputs(i.e.,forall<푖,푗>suchthat푦푖푗=0)

Results
•PCAwasusedtoreducethedimensionalityofimage,speech,andtextdata,bothtospeeduptrainingandavoidoverfitting
•Boththenumberoftargetneighbors(k)andtheweightingparameter(c) inoptimizationproblemweresetbycrossvalidation
•KNNEuclideandistance
•KNNMahalanobisdistance
•Energybasedclassification
•
•MulticlassSVM

Results

GBLMNN: Non-linear method for LMNN
•Buildingnon-linearmappingformetriclearning
•GBRT:GradientBoostingRegressionTree
•Code:http://www.cse.wustl.edu/~kilian/code/code.html
2014-03-05 Kedem, D., Xu, Z., & Weinberger, K. (n.d.). Gradient Boosted Large Margin Nearest Neighbors, (1), 10–12. 19

Large Margin Component Analysis (LMCA)
Torresani,L.,&Lee,K.(2007).Largemargincomponentanalysis.AdvancesinNeuralInformationProcessing
2014-03-05 Torresani, L., & Lee, K. (2007). Large margin component analysis. Advances in Neural Information Processing 20

Problem: large number of features
•Inproblemsinvolvingthousandsoffeatures,distancemetriclearningcannotbeused
•Highcomputationcomplexity
•Overffitingproblem
•Previousworkshavereliedonatwo-stepsolution
•Applydimensionreductionalgorithm(e.g.PCA)
•Learnametricintheresultinglow-dimensionalsubspace
•LMCA:providebettersolutionforhighdimensionalproblem

Linear Dimensionality Reduction for KNN (1/3)
•Recall:costfunctionofmetricfunction
•Inpreviouswork,Lwas퐷×퐷matrix
•Idea:learnnonsquarematrixsize푑×퐷,푤푖푡ℎ푑≪퐷
•Problem:costfunction휖퐌,푤ℎ푒푟푒퐌=퐋⊤퐋isnolongerconvex
•Mhaslowrankd,notD

•Idea:optimizetheobjectivefunctiondirectlywithrespecttothenonsquarematrixL
•Although,equationisnon-convex,itisbetterthanminimizingtheobjectivefunctionwithrespecttoLratherthanwithrespecttotherank- deficientD×DmatrixM
•Sincethisoptimizationinvolvesonly푑∙퐷ratherthan퐷2unknowns,itcanreducetheriskofoverffiting
•“theoptimalrectangularmatrixLcomputedwithourmethodautomaticallysatisfiestherankconstraintsonMwithoutrequiringthesolutionofdifficultconstrainedminimizationproblems”

•Minimizecostfunctionusinggradient-basedoptimizers
•“Wehandlethenon-differentiabilityofh(s)ats=0,byadoptingasmoothhingefunctionasin[8]”
[8] J. D. M. Rennieand N. Srebro. Fast maximum margin matrix factorization for collaborative prediction.
In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005.

Nonlinear Feature Extraction (1/2)
•“Ourapproachlearnsalow-rankMahalanobisdistancemetricinahighdimensionalfeaturespaceF,relatedtotheinputsbyanonlinearmap 휙:푅퐷→퐹”
•kiskernelfunctioncomputedbythefeatureinputproducts
•LisnowatransformationfromthespaceFintoa푅푑(푑≪퐷)
•

Nonlinear Feature Extraction (2/2)
•
•
•
•UpdateΩ←(Ω−휆Γ)untilconverge
•Forclassification,weprojectpointsontothelearnedlow-dimspacebyexploitingthekerneltrick:퐿흓=Ω퐤

Results –High dimensional dataset

Results –low dimensional dataset

Reference
•Bellet,A.,Habrard,A.,andSebban,M.ASurveyonMetricLearningforFeatureVectorsandStructuredData,2013
•LiuYang,DistanceMetricLearning:AComprehensiveSurvey,2005
•LiuYang,AnOverviewofDistanceMetricLearning,2007
•MarcoCuturi.KAISTMachineLearningTutorialMetricsandKernelsAfewrecenttopics,2013
•NakulVerma,AtutorialonMetricLearningwithsomerecentadvances
•AurelienBellet,TutorialonMetricLearning,2013
•BrianKulis.TutorialonMetricLearning.InternationalConferenceonMachineLearning(ICML)2010
2014-03-05 29

Reference
•K.Q.Weinberger,J.Blitzer,andL.K.Saul(2006).InY.Weiss,B.Schoelkopf, andJ.Platt(eds.),DistanceMetricLearningforLargeMarginNearestNeighborClassification,AdvancesinNeuralInformationProcessingSystems18(NIPS-18).MITPress:Cambridge,MA.
•K.Q.Weinberger,L.K.Saul.DistanceMetricLearningforLargeMarginNearestNeighborClassification.JournalofMachineLearningResearch(JMLR)2009,10:207-244
•Torresani,L.,&Lee,K.(2007).Largemargincomponentanalysis. AdvancesinNeuralInformationProcessing
•Kedem,D.,Xu,Z.,&Weinberger,K.(n.d.).GradientBoostedLargeMarginNearestNeighbors,(1),10–12.
2014-03-05 30

Distance Metric Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Distance Metric Learning

Similar to Distance Metric Learning (20)

Recently uploaded

Recently uploaded (20)

Distance Metric Learning