Successfully reported this slideshow.
Upcoming SlideShare
×

# Set prediction three ways

268 views

Published on

Slides from my talk at the Cornell SCAN seminar.
September 10, 2018.

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No
• 80% Win Rate? It's Not a BUG? [Proof Inside] ▲▲▲ http://scamcb.com/zcodesys/pdf

Are you sure you want to  Yes  No
• Be the first to like this

### Set prediction three ways

1. 1. 1 Joint work with Ravi Kumar & Andrew Tomkins (Google) Rediet Abebe & Jon Kleinberg (Cornell) Michael Schaub & Ali Jadbabaie (MIT) Set prediction three ways Austin R. Benson · Cornell SCAN Seminar · September 10, 2018 Slides. bit.ly/arb-SCAN18
2. 2. We usually think about predicting single-item events. 2
3. 3. This talk looks at predicting sets from three perspectives. 3 Set-based data is common, but we don’t have a great understanding of its complexities and the associated human behavior. • Team formation (writing papers, organizational behavior). • Multiple classification codes in hospital visits. • Co-purchasing sets on Amazon. • Sets of annotations on questions on web forums.
4. 4. 4 Set Prediction #1. Individuals repeating interactions. Given a history of an individual’s set-based interactions, which ones repeat? Who will repeat as my coauthors on my next paper? Sequences of Sets. Benson, Kumar, & Tomkins. KDD, 2018.
5. 5. Lots of data looks like sequences of sets. 5 EMAIL Sequence of recipient sets in my email ⟶ one sequence of sets Collection of email senders ⟶ sequences of sets.
6. 6. Lots of data looks like sequences of sets. 6 Q&A FORUM TAGS
7. 7. Our work provides a generative model that captures the important characteristics of sequences of sets. 7 1. email data sequence for each account sets are recipients on emails sent by account 2. Stack Exchange tags sequence for each user sets are tags on questions asked by the user 3. Coauthorship sequence for each academic sets are coauthors on paper 4. Proximity contact sequence for each person sets are people interacting with the person tags-mathoverflow tags-math-sx email-Enron-core email-Eu-core contact-prim-school contact-high-school coauth-Business coauth-Geology
8. 8. Our work provides a generative model that captures the important characteristics of sequences of sets. 8 Applications. 1. Predicting new sets. 2. Understanding basic user behaviors. 3. Generative model ⟶ event likelihood ⟶ anomaly detection. 4. Generative model ⟶ simulation. 5. Amenable to analysis.
9. 9. 9 What are the important characteristics of the data?
10. 10. Most sets are not entirely novel & many are exact repeats. 10 tags-mathoverflow tags-math-sx email-Enron-core email-Eu-core contact-prim-school contact-high-school coauth-Business coauth-Geology
11. 11. Subsets and supersets of prior sets are common. 11 tags-mathoverflow tags-math-sx email-Enron-core email-Eu-core contact-prim-school contact-high-school coauth-Business coauth-Geology
12. 12. There is recency bias in the repeat behavior. 12 Consistent with previous results on sequences of single items. [Benson-Kumar-Tomkins 16; Anderson+ 14]