Like this presentation? Why not share!

- Computational Social Science, Lectu... by jakehofman 2596 views
- Computational Social Science, Lectu... by jakehofman 2676 views
- Computational Social Science, Lectu... by jakehofman 1379 views
- Computational Social Science, Lectu... by jakehofman 1552 views
- Computational Social Science, Lectu... by jakehofman 1727 views
- Computational Social Science, Lectu... by jakehofman 1756 views

1,281

-1

-1

Published on

No Downloads

Total Views

1,281

On Slideshare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

0

Comments

0

Likes

1

No embeds

No notes for slide

- 1. ClassiﬁcationAPAM E4990Computational Social ScienceJake HofmanColumbia UniversityApril 26, 2013Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 1 / 11
- 2. Prediction a la Bayes1• You’re testing for a rare condition:• 1% of the student population is in this class• You have a highly sensitive and speciﬁc test:• 99% of students in the class visit compsocialscience.org• 99% of students who aren’t in the class don’t visit this site• Given that a student visits the course site, what is probabilitythe student is in our class?1Follows Wiggins, SciAm 2006Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 2 / 11
- 3. Prediction a la BayesStudents10,000 ppl1% In class100 ppl99% Visit99 ppl1% Don’t visit1 per99% Not in class9900 ppl1% Visit99 ppl99% Don’t visit9801 pplJake Hofman (Columbia University) Classiﬁcation April 26, 2013 3 / 11
- 4. Prediction a la BayesStudents10,000 ppl1% In class100 ppl99% Visit99 ppl1% Don’t visit1 per99% Not in class9900 ppl1% Visit99 ppl99% Don’t visit9801 pplSo given that a student visits the site (198 ppl), there is a 50%chance the student is in our class (99 ppl)!Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 3 / 11
- 5. Prediction a la BayesStudents10,000 ppl1% In class100 ppl99% Visit99 ppl1% Don’t visit1 per99% Not in class9900 ppl1% Visit99 ppl99% Don’t visit9801 pplThe small error rate on the large population outside of our classproduces many false positives.Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 3 / 11
- 6. Inverting conditional probabilitiesBayes’ TheoremEquate the far right- and left-hand sides of product rulep (y|x) p (x) = p (x, y) = p (x|y) p (y)and divide to get the probability of y given x from the probabilityof x given y:p (y|x) =p (x|y) p (y)p (x)where p (x) = y∈ΩYp (x|y) p (y) is the normalization constant.Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 4 / 11
- 7. Predictions a la BayesGiven that a patient tests positive, what is probability the patientis sick?p (class|visit) =99/100p (visit|class)1/100p (class)p (visit)99/1002+99/1002=198/1002=99198=12where p (visit) = p (visit|class) p (class) + p visit|class p class .Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 5 / 11
- 8. (Super) Naive BayesWe can use Bayes’ rule to build a one-site student classiﬁer:p (class|site) =p (site|class) p (class)p (site)where we estimate these probabilities with ratios of counts:ˆp(site|class) =# students in class who visit site# students in classˆp(site|class) =# students not in class who visit site# students not in classˆp(class) =# students in class# studentsˆp(class) =# students not in class# studentsJake Hofman (Columbia University) Classiﬁcation April 26, 2013 6 / 11
- 9. Naive BayesRepresent each student by a binary vector x where xj = 1 if thestudent has visited the j-th site (xj = 0 otherwise).Modeling each site as an independent Bernoulli random variable,the probability of visiting a set of sites x given class membershipc = 0, 1:p (x|c) =jθxjjc (1 − θjc)1−xjwhere θjc denotes the probability that the j-th site is visited by astudent with class membership c.Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 7 / 11
- 10. Naive BayesUsing this likelihood in Bayes’ rule and taking a logarithm, we have:log p (c|x) = logp (x|c) p (c)p (x)=jxj logθjc1 − θjc+jlog(1 − θjc) + logθcp (x)Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 8 / 11
- 11. Naive BayesWe can eliminate p (x) by calculating the log-odds:logp (1|x)p (0|x)=jxj logθj1(1 − θj0)θj0(1 − θj1)wj+jlog1 − θj11 − θj0+ logθ1θ0w0which gives a linear classiﬁer of the form w · x + w0Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 9 / 11
- 12. Naive BayesWe train by counting students and sites to estimate θjc and θc:ˆθjc =njcncˆθc =ncnand use these to calculate the weights ˆwj and bias ˆw0:ˆwj = logˆθj1(1 − ˆθj0)ˆθj0(1 − ˆθj1)ˆw0 =jlog1 − ˆθj11 − ˆθj0+ logˆθ1ˆθ0.We we predict by simply adding the weights of the sites that astudent has visited to the bias term.Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 10 / 11
- 13. Naive BayesIn practice, this works better than one might expect given itssimplicity22http://www.jstor.org/pss/1403452Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 11 / 11
- 14. Naive BayesTraining is computationally cheap and scalable, and the model iseasy to update given new observations22http://www.springerlink.com/content/wu3g458834583125/Jake Hofman (Columbia University) Classiﬁcation April 26, 2013 11 / 11
- 15. Naive BayesPerformance varies with document representations andcorresponding likelihood models22http://ceas.cc/2006/15.pdfJake Hofman (Columbia University) Classiﬁcation April 26, 2013 11 / 11
- 16. Naive BayesIt’s often important to smooth parameter estimates (e.g., byadding pseudocounts) to avoid overﬁttingJake Hofman (Columbia University) Classiﬁcation April 26, 2013 11 / 11

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment