Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1. Solving a classification problem first may be wasteful
2. Need to address class distribution drift in test sets
Upcoming SlideShare
Loading in …5



Published on

The research work presented at the KDD 2016.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this


  1. 1. 1. Solving a classification problem first may be wasteful 2. Need to address class distribution drift in test sets Quantification Performance Measures 1. Capture quantification goals directly, OR 2. Balance quantification and classification goals (hybrid) 3. Challenging to optimize on voluminous, streaming data 1. Receive a data point 2. Fix dual variables, take SGD step to update model 3. Fix model, take SGD steps to update dual variables 4. Updates extremely cheap: closed form for dual variables Goal: Estimate the relative prevalence of classes of interest in large unlabeled populations in online, streaming settings Applications of Quantification Sentiment Analysis KatyCipriano The best part of the meal is the dessert which they dont make themselves – just sayin. @bouzagloabc 2 hours ago Tweet JuliaChild Loved the food – worth the 45 minute wait! Can’t wait for my Sunday brunch at ABC. @bouzagloabc 1 hours ago Tweet GordonRamsay It was RAAAAW. @bouzagloabc 3 days ago Tweet PaulaDeen @GordonRamsay Samy the owner threw me out just for pointing that out! Disastrous service 2 days ago Tweet Several applications directly require estimates of class ratios a.k.a. Counting, Class probability re-estimation, Class prior estimation Epidemiology Challenges Online Optimization Methods for the Quantification Problem Purushottam Kar¹, Shuai Li², Harikrishna Narasimhan³, Sanjay Chawla⁴, Fabrizio Sebastiani⁴ ¹IIT Kanpur, India, ²University of Insubria, Italy, ³Harvard University, USA, ⁴QCRI-HBKU, Qatar Full Paper: 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining Quantification Performance Measures † ‡Quantification Performance Measure Hybrid Performance Measure Nested Concave Measures Pseudo-concave Measures NegKLD† QMeasure‡ BAKLD‡ CQReward‡ BKReward‡ NestedConcaveMeasures Normalized Square Score† 1. Dual computation of nested functions difficult, costly updates 2. Solution: apply duality to nested functions in nested manner! Key Idea 1. Use the level set function as a proxy objective function 2. Exploit the fact that the level set functions are concave Key Idea Fenchel Duality Level Set Structure Level sets are convex Fenchel “dual”Dual variablesAny ccv function Linear in TPR and TNR for fixed values of dual variables! NEMSIS (streaming) PseudoConcaveMeasures CAN (non-streaming) Guarantee for NEMSIS, SCAN 1. Execute E and M steps approximately in “streaming epochs” 2. E epochs use streaming data to estimate 3. M epochs execute NEMSIS on streaming data - optimize proxy 4. Epochs made progressively longer: more accurate E,M steps SCAN (streaming) Find new level Optimize proxy Progress in proxy provably linked to progress in perf. Level function ccv cvx ccv E M E M E M E M … Guarantee for CAN Experimental Results ccv: concave cvx: convex Superior accuracies and training times across quant and hybrid measures as well as datasets NS: dual updates made using actual TPR/TNR values not surrogates KDD08 PPI CovertypeKDD08 AdultCod-RNA Covertype Adult Attractive trade-off b/w quant/class performance using BAKLD perf. Robustness to drift in class proportions (smaller is better in PosKLD) Theoretical Guarantees Classification accuracy: 50%  But … #False pos. = #False neg. ⇒ Perfect quantification  (Perfect classification impossible) Balanced Accuracy (BA) Observation: All quantification measures naturally nested concave or pseudo concave – exploit to optimize scalably? Psephology Cause-specific Mortality analysis Transfer Learning