Successfully reported this slideshow.
Upcoming SlideShare
×

# Computational Social Science, Lecture 13: Classification

1,704 views

Published on

Published in: Education, Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Computational Social Science, Lecture 13: Classification

1. 1. ClassiﬁcationAPAM E4990Computational Social ScienceJake HofmanColumbia UniversityApril 26, 2013Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 1 / 11
2. 2. Prediction a la Bayes1• You’re testing for a rare condition:• 1% of the student population is in this class• You have a highly sensitive and speciﬁc test:• 99% of students in the class visit compsocialscience.org• 99% of students who aren’t in the class don’t visit this site• Given that a student visits the course site, what is probabilitythe student is in our class?1Follows Wiggins, SciAm 2006Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 2 / 11
3. 3. Prediction a la BayesStudents10,000 ppl1% In class100 ppl99% Visit99 ppl1% Don’t visit1 per99% Not in class9900 ppl1% Visit99 ppl99% Don’t visit9801 pplJake Hofman (Columbia University) Classiﬁcation April 26, 2013 3 / 11
4. 4. Prediction a la BayesStudents10,000 ppl1% In class100 ppl99% Visit99 ppl1% Don’t visit1 per99% Not in class9900 ppl1% Visit99 ppl99% Don’t visit9801 pplSo given that a student visits the site (198 ppl), there is a 50%chance the student is in our class (99 ppl)!Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 3 / 11
5. 5. Prediction a la BayesStudents10,000 ppl1% In class100 ppl99% Visit99 ppl1% Don’t visit1 per99% Not in class9900 ppl1% Visit99 ppl99% Don’t visit9801 pplThe small error rate on the large population outside of our classproduces many false positives.Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 3 / 11
6. 6. Inverting conditional probabilitiesBayes’ TheoremEquate the far right- and left-hand sides of product rulep (y|x) p (x) = p (x, y) = p (x|y) p (y)and divide to get the probability of y given x from the probabilityof x given y:p (y|x) =p (x|y) p (y)p (x)where p (x) = y∈ΩYp (x|y) p (y) is the normalization constant.Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 4 / 11
7. 7. Predictions a la BayesGiven that a patient tests positive, what is probability the patientis sick?p (class|visit) =99/100p (visit|class)1/100p (class)p (visit)99/1002+99/1002=198/1002=99198=12where p (visit) = p (visit|class) p (class) + p visit|class p class .Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 5 / 11
8. 8. (Super) Naive BayesWe can use Bayes’ rule to build a one-site student classiﬁer:p (class|site) =p (site|class) p (class)p (site)where we estimate these probabilities with ratios of counts:ˆp(site|class) =# students in class who visit site# students in classˆp(site|class) =# students not in class who visit site# students not in classˆp(class) =# students in class# studentsˆp(class) =# students not in class# studentsJake Hofman (Columbia University) Classiﬁcation April 26, 2013 6 / 11
9. 9. Naive BayesRepresent each student by a binary vector x where xj = 1 if thestudent has visited the j-th site (xj = 0 otherwise).Modeling each site as an independent Bernoulli random variable,the probability of visiting a set of sites x given class membershipc = 0, 1:p (x|c) =jθxjjc (1 − θjc)1−xjwhere θjc denotes the probability that the j-th site is visited by astudent with class membership c.Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 7 / 11
10. 10. Naive BayesUsing this likelihood in Bayes’ rule and taking a logarithm, we have:log p (c|x) = logp (x|c) p (c)p (x)=jxj logθjc1 − θjc+jlog(1 − θjc) + logθcp (x)Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 8 / 11
11. 11. Naive BayesWe can eliminate p (x) by calculating the log-odds:logp (1|x)p (0|x)=jxj logθj1(1 − θj0)θj0(1 − θj1)wj+jlog1 − θj11 − θj0+ logθ1θ0w0which gives a linear classiﬁer of the form w · x + w0Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 9 / 11
12. 12. Naive BayesWe train by counting students and sites to estimate θjc and θc:ˆθjc =njcncˆθc =ncnand use these to calculate the weights ˆwj and bias ˆw0:ˆwj = logˆθj1(1 − ˆθj0)ˆθj0(1 − ˆθj1)ˆw0 =jlog1 − ˆθj11 − ˆθj0+ logˆθ1ˆθ0.We we predict by simply adding the weights of the sites that astudent has visited to the bias term.Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 10 / 11
13. 13. Naive BayesIn practice, this works better than one might expect given itssimplicity22http://www.jstor.org/pss/1403452Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 11 / 11
14. 14. Naive BayesTraining is computationally cheap and scalable, and the model iseasy to update given new observations22http://www.springerlink.com/content/wu3g458834583125/Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 11 / 11
15. 15. Naive BayesPerformance varies with document representations andcorresponding likelihood models22http://ceas.cc/2006/15.pdfJake Hofman (Columbia University) Classiﬁcation April 26, 2013 11 / 11
16. 16. Naive BayesIt’s often important to smooth parameter estimates (e.g., byadding pseudocounts) to avoid overﬁttingJake Hofman (Columbia University) Classiﬁcation April 26, 2013 11 / 11