Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Computational Social Science, Lecture 13: Classification

1,704 views

Published on

Published in: Education, Technology
  • Be the first to comment

Computational Social Science, Lecture 13: Classification

  1. 1. ClassificationAPAM E4990Computational Social ScienceJake HofmanColumbia UniversityApril 26, 2013Jake Hofman (Columbia University) Classification April 26, 2013 1 / 11
  2. 2. Prediction a la Bayes1• You’re testing for a rare condition:• 1% of the student population is in this class• You have a highly sensitive and specific test:• 99% of students in the class visit compsocialscience.org• 99% of students who aren’t in the class don’t visit this site• Given that a student visits the course site, what is probabilitythe student is in our class?1Follows Wiggins, SciAm 2006Jake Hofman (Columbia University) Classification April 26, 2013 2 / 11
  3. 3. Prediction a la BayesStudents10,000 ppl1% In class100 ppl99% Visit99 ppl1% Don’t visit1 per99% Not in class9900 ppl1% Visit99 ppl99% Don’t visit9801 pplJake Hofman (Columbia University) Classification April 26, 2013 3 / 11
  4. 4. Prediction a la BayesStudents10,000 ppl1% In class100 ppl99% Visit99 ppl1% Don’t visit1 per99% Not in class9900 ppl1% Visit99 ppl99% Don’t visit9801 pplSo given that a student visits the site (198 ppl), there is a 50%chance the student is in our class (99 ppl)!Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
  5. 5. Prediction a la BayesStudents10,000 ppl1% In class100 ppl99% Visit99 ppl1% Don’t visit1 per99% Not in class9900 ppl1% Visit99 ppl99% Don’t visit9801 pplThe small error rate on the large population outside of our classproduces many false positives.Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
  6. 6. Inverting conditional probabilitiesBayes’ TheoremEquate the far right- and left-hand sides of product rulep (y|x) p (x) = p (x, y) = p (x|y) p (y)and divide to get the probability of y given x from the probabilityof x given y:p (y|x) =p (x|y) p (y)p (x)where p (x) = y∈ΩYp (x|y) p (y) is the normalization constant.Jake Hofman (Columbia University) Classification April 26, 2013 4 / 11
  7. 7. Predictions a la BayesGiven that a patient tests positive, what is probability the patientis sick?p (class|visit) =99/100p (visit|class)1/100p (class)p (visit)99/1002+99/1002=198/1002=99198=12where p (visit) = p (visit|class) p (class) + p visit|class p class .Jake Hofman (Columbia University) Classification April 26, 2013 5 / 11
  8. 8. (Super) Naive BayesWe can use Bayes’ rule to build a one-site student classifier:p (class|site) =p (site|class) p (class)p (site)where we estimate these probabilities with ratios of counts:ˆp(site|class) =# students in class who visit site# students in classˆp(site|class) =# students not in class who visit site# students not in classˆp(class) =# students in class# studentsˆp(class) =# students not in class# studentsJake Hofman (Columbia University) Classification April 26, 2013 6 / 11
  9. 9. Naive BayesRepresent each student by a binary vector x where xj = 1 if thestudent has visited the j-th site (xj = 0 otherwise).Modeling each site as an independent Bernoulli random variable,the probability of visiting a set of sites x given class membershipc = 0, 1:p (x|c) =jθxjjc (1 − θjc)1−xjwhere θjc denotes the probability that the j-th site is visited by astudent with class membership c.Jake Hofman (Columbia University) Classification April 26, 2013 7 / 11
  10. 10. Naive BayesUsing this likelihood in Bayes’ rule and taking a logarithm, we have:log p (c|x) = logp (x|c) p (c)p (x)=jxj logθjc1 − θjc+jlog(1 − θjc) + logθcp (x)Jake Hofman (Columbia University) Classification April 26, 2013 8 / 11
  11. 11. Naive BayesWe can eliminate p (x) by calculating the log-odds:logp (1|x)p (0|x)=jxj logθj1(1 − θj0)θj0(1 − θj1)wj+jlog1 − θj11 − θj0+ logθ1θ0w0which gives a linear classifier of the form w · x + w0Jake Hofman (Columbia University) Classification April 26, 2013 9 / 11
  12. 12. Naive BayesWe train by counting students and sites to estimate θjc and θc:ˆθjc =njcncˆθc =ncnand use these to calculate the weights ˆwj and bias ˆw0:ˆwj = logˆθj1(1 − ˆθj0)ˆθj0(1 − ˆθj1)ˆw0 =jlog1 − ˆθj11 − ˆθj0+ logˆθ1ˆθ0.We we predict by simply adding the weights of the sites that astudent has visited to the bias term.Jake Hofman (Columbia University) Classification April 26, 2013 10 / 11
  13. 13. Naive BayesIn practice, this works better than one might expect given itssimplicity22http://www.jstor.org/pss/1403452Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
  14. 14. Naive BayesTraining is computationally cheap and scalable, and the model iseasy to update given new observations22http://www.springerlink.com/content/wu3g458834583125/Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
  15. 15. Naive BayesPerformance varies with document representations andcorresponding likelihood models22http://ceas.cc/2006/15.pdfJake Hofman (Columbia University) Classification April 26, 2013 11 / 11
  16. 16. Naive BayesIt’s often important to smooth parameter estimates (e.g., byadding pseudocounts) to avoid overfittingJake Hofman (Columbia University) Classification April 26, 2013 11 / 11

×