SlideShare a Scribd company logo
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
haventreddityet.com
A reddit recommendation discovery engine
James Douglas Pearce
1 / 13
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
HOW CAN I DISCOVER NEW SUBREDDITS?
Results seem to be skewed by a few very popular subreddits...
Simple list
No context
Does not encourage discovery!
2 / 13
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
Live Demo
3 / 13
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
ALGORITHM
Jaccard similarity: overlap of user in subreddits divided by total
Categories are defined as a set of example subreddits
4 / 13
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
GRAPH GERNERATION
Subgraphs generated on the fly with a series of random walks
Robert Frost rule: take the road less traveled
Hipster rule: Penalize popular, i.e. highly connected nodes
5 / 13
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
MORE ABOUT ME
Research:
Searching for Dark Matter at
the LHC
Big Data!
Specialize in data mining and
machine learning
Hobbies:
Kaggle competitions
Poker
Motorcycles
Boardgames
6 / 13
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
Questions?
7 / 13
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
UNIQUE CATEGORIES?
8 / 13
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
REDDIT BY CATEGORIES
9 / 13
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
JACCARD COEFFICIENT
The Jaccard coefficient can be used as a user-user type similarity measure.
JB(A) =
|A ∩ B|
|A ∪ B|
(1)
A is the set of redditors in subreddit A,
B is the set of redditors in subreddit B.
JCat
C (A) =
1
|C|
Bi∈C
JBi
(A) (2)
C is the set of sets of redditors in subreddits Bi in category C,
|C| is the size of set C (number of sets).
10 / 13
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
CRAWLING REDDIT WITH THE PRAW API
reddit.com is HUGE, I can only
take a small sample
The redditor-subreddit matrix I
am sampling from is sparse
I want to collect information
about as many subreddits as
possible
I want my redditor sets to
overlap
Strategy:
Collect “smart” data
Use “dumb” (Jaccard)
algorithm
11 / 13
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
MODEL VALIDATION
For each redditor hold out one subreddit they are part of
Make a list of the top N subreddits based on Jacquard similarity
Calculate the accuracy of that recommendation as a function of N12 / 13
THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES
SUBGRAPH GENERATOR
Ptransition(ni) ∝ Jcanada(ni) · αntrans
· βncon
(3)
α ∈ (0, 1) is a transition decay factor
ntrans is the number of times ni has been traversed
β ∈ (0, 1) is a connectivity decay factor
ncon is ni number of connections (degree)
13 / 13

More Related Content

Viewers also liked

Big Data, Health Outcomes & Marketing by Ben Wolin
Big Data, Health Outcomes & Marketing by Ben Wolin Big Data, Health Outcomes & Marketing by Ben Wolin
Big Data, Health Outcomes & Marketing by Ben Wolin
Everyday Health
 
Tiens OPP by Fiaz Hussain Malik
Tiens OPP by Fiaz Hussain MalikTiens OPP by Fiaz Hussain Malik
Tiens OPP by Fiaz Hussain Malik
Fiaz Hussain
 
JamesDPearce_MonoX
JamesDPearce_MonoXJamesDPearce_MonoX
JamesDPearce_MonoXJames Pearce
 
Types and classification of fractures
Types and classification of fracturesTypes and classification of fractures
Types and classification of fractures
Daaneyal Dilawar
 

Viewers also liked (6)

Galveston
GalvestonGalveston
Galveston
 
vicTheoryWorkshop
vicTheoryWorkshopvicTheoryWorkshop
vicTheoryWorkshop
 
Big Data, Health Outcomes & Marketing by Ben Wolin
Big Data, Health Outcomes & Marketing by Ben Wolin Big Data, Health Outcomes & Marketing by Ben Wolin
Big Data, Health Outcomes & Marketing by Ben Wolin
 
Tiens OPP by Fiaz Hussain Malik
Tiens OPP by Fiaz Hussain MalikTiens OPP by Fiaz Hussain Malik
Tiens OPP by Fiaz Hussain Malik
 
JamesDPearce_MonoX
JamesDPearce_MonoXJamesDPearce_MonoX
JamesDPearce_MonoX
 
Types and classification of fractures
Types and classification of fracturesTypes and classification of fractures
Types and classification of fractures
 

Similar to haventreddityet demo

ALFRED demo - www2013
ALFRED demo - www2013ALFRED demo - www2013
ALFRED demo - www2013
Disheng Qiu
 
Advanced #2 - ui perf
 Advanced #2 - ui perf Advanced #2 - ui perf
Advanced #2 - ui perf
Vitali Pekelis
 
Need 4 speed
Need 4 speedNeed 4 speed
Need 4 speed
alikonweb
 
Mining social data
Mining social dataMining social data
Mining social data
Malk Zameth
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
Spark Summit
 
Dealing with web scale data
Dealing with web scale dataDealing with web scale data
Dealing with web scale data
Jnaapti
 
Willow, the interaction tour
Willow, the interaction tourWillow, the interaction tour
Willow, the interaction tour
ESUG
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIlya Grigorik
 
Eventual Consistency - Desining Fail Proof Systems
Eventual Consistency - Desining Fail Proof SystemsEventual Consistency - Desining Fail Proof Systems
Eventual Consistency - Desining Fail Proof Systems
Grzegorz Skorupa
 
Willow, the interaction tour by Maxi Tabacman
Willow, the interaction tour by Maxi TabacmanWillow, the interaction tour by Maxi Tabacman
Willow, the interaction tour by Maxi Tabacman
FAST
 
Scratch
ScratchScratch
Scratch
gulbrandsont
 
Hobbyist 3D Printing Basics
Hobbyist 3D Printing BasicsHobbyist 3D Printing Basics
Hobbyist 3D Printing Basics
John Abella
 
Improving Bug Tracking Systems
Improving Bug Tracking SystemsImproving Bug Tracking Systems
Improving Bug Tracking Systems
Rahul Premraj
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
Rajesh Rajamani
 

Similar to haventreddityet demo (16)

ALFRED demo - www2013
ALFRED demo - www2013ALFRED demo - www2013
ALFRED demo - www2013
 
Advanced #2 - ui perf
 Advanced #2 - ui perf Advanced #2 - ui perf
Advanced #2 - ui perf
 
Need 4 speed
Need 4 speedNeed 4 speed
Need 4 speed
 
Mining social data
Mining social dataMining social data
Mining social data
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
 
Dealing with web scale data
Dealing with web scale dataDealing with web scale data
Dealing with web scale data
 
Willow, the interaction tour
Willow, the interaction tourWillow, the interaction tour
Willow, the interaction tour
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine Learning
 
Eventual Consistency - Desining Fail Proof Systems
Eventual Consistency - Desining Fail Proof SystemsEventual Consistency - Desining Fail Proof Systems
Eventual Consistency - Desining Fail Proof Systems
 
Willow, the interaction tour by Maxi Tabacman
Willow, the interaction tour by Maxi TabacmanWillow, the interaction tour by Maxi Tabacman
Willow, the interaction tour by Maxi Tabacman
 
Scratch
ScratchScratch
Scratch
 
Hobbyist 3D Printing Basics
Hobbyist 3D Printing BasicsHobbyist 3D Printing Basics
Hobbyist 3D Printing Basics
 
Improving Bug Tracking Systems
Improving Bug Tracking SystemsImproving Bug Tracking Systems
Improving Bug Tracking Systems
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 

Recently uploaded

Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 

haventreddityet demo

  • 1. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES haventreddityet.com A reddit recommendation discovery engine James Douglas Pearce 1 / 13
  • 2. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES HOW CAN I DISCOVER NEW SUBREDDITS? Results seem to be skewed by a few very popular subreddits... Simple list No context Does not encourage discovery! 2 / 13
  • 3. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES Live Demo 3 / 13
  • 4. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES ALGORITHM Jaccard similarity: overlap of user in subreddits divided by total Categories are defined as a set of example subreddits 4 / 13
  • 5. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES GRAPH GERNERATION Subgraphs generated on the fly with a series of random walks Robert Frost rule: take the road less traveled Hipster rule: Penalize popular, i.e. highly connected nodes 5 / 13
  • 6. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES MORE ABOUT ME Research: Searching for Dark Matter at the LHC Big Data! Specialize in data mining and machine learning Hobbies: Kaggle competitions Poker Motorcycles Boardgames 6 / 13
  • 7. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES Questions? 7 / 13
  • 8. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES UNIQUE CATEGORIES? 8 / 13
  • 9. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES REDDIT BY CATEGORIES 9 / 13
  • 10. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES JACCARD COEFFICIENT The Jaccard coefficient can be used as a user-user type similarity measure. JB(A) = |A ∩ B| |A ∪ B| (1) A is the set of redditors in subreddit A, B is the set of redditors in subreddit B. JCat C (A) = 1 |C| Bi∈C JBi (A) (2) C is the set of sets of redditors in subreddits Bi in category C, |C| is the size of set C (number of sets). 10 / 13
  • 11. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES CRAWLING REDDIT WITH THE PRAW API reddit.com is HUGE, I can only take a small sample The redditor-subreddit matrix I am sampling from is sparse I want to collect information about as many subreddits as possible I want my redditor sets to overlap Strategy: Collect “smart” data Use “dumb” (Jaccard) algorithm 11 / 13
  • 12. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES MODEL VALIDATION For each redditor hold out one subreddit they are part of Make a list of the top N subreddits based on Jacquard similarity Calculate the accuracy of that recommendation as a function of N12 / 13
  • 13. THE PROBLEM LIVE DEMO ALGO DATA THANK YOU! BONUS SLIDES SUBGRAPH GENERATOR Ptransition(ni) ∝ Jcanada(ni) · αntrans · βncon (3) α ∈ (0, 1) is a transition decay factor ntrans is the number of times ni has been traversed β ∈ (0, 1) is a connectivity decay factor ncon is ni number of connections (degree) 13 / 13