SlideShare a Scribd company logo
1 of 18
prep fo/r/diy
Find relevant content to plan your
next hobby or home-improvement
Kristofor Nyquist
…but reddit is a popularity contest
Given that I’m interested in a post
1. What are similar posts I can read?
2. Who are the best people I can talk to?
Goal: find related DIY projects
Collecting / organizing data
• Scraped reddit for do-it-yourself projects
– Collected text content from each project
i.e. title, externally linked blog post, comments,
and general topic
Data conversion
• Combine all of the text to create one
“document”
• Treat the document as a list of words
• Convert the list of words to a list of numbers
– Each number represents the “uniqueness” of a
particular word to its document
i.e. 0 means word appears in every post
1 means word only appears in that post
Data conversion
• Combine all of the text to create one
“document”
• Treat the document as a list of words
• Convert the list of words to a list of numbers
– Each number represents the “uniqueness” of a
particular word to its document
i.e. 0 means word appears in every post
1 means word only appears in that post
Data conversion
• Combine all of the text to create one
“document”
• Treat the document as a list of words
• Convert the list of words to a list of numbers
– Each number represents the “uniqueness” of a
particular word to its document
i.e. 0 means word appears in every post
1 means word only appears in that post
Data conversion
• Combine all of the text to create one
“document”
• Treat the document as a list of words
• Convert the list of words to a list of numbers
– Each number represents the “uniqueness” of a
particular word to its document
i.e. 0 means word appears in every post
1 means word only appears in that post
Data conversion
• Combine all of the text to create one
“document”
• Treat the document as a list of words
• Convert the list of words to a list of numbers
– Each number represents the “uniqueness” of a
particular word to its document
i.e. 0 means word appears in every post
1 means word only appears in that post
Clustering
Text content is rich enough to cluster projects
Clustering
Classify projects with missing categories based on user inputs
Validating post similarity
Similar posts group well by general topic.
Deviations can make sense
Against random
Obvious that randomly picking posts is not suitable
PhD Biophysics, UC Berkeley
BS Physics, WSU
About me (Kristofor Nyquist)
Hobbies
Algorithm
Post similarity / Classification
• Turn the text into a list of numbers using term-frequency-
inverse-document-frequency
• “Compress” the data for speed
– ~70,000 dimensions to 80 dimensions
For similarity:
• Calculate cosine-similarity between documents
• Present user with 5 most similar posts
For classification:
• Logistic regression (L1 regularization)
Settling on 80 PCs
80 principal components somewhat arbitrary
BUT overall accuracy of classifier has definitely converged…
even though 80 PCs capture ~30% of variance
Validation numbers
• NLP algorithm
– accuracy: 0.62
recalls:
auto: 0.73
electronic: 0.6
home improve: 0.54
metalwork: 0.44
other: 0.52
outdoor: 0.89
woodworking: 0.68
• Random
–accuracy: 0.17
recalls:
auto: 0.05
electronic: 0.00
home improve: 0.18
metalwork: 0.00
other: 0.08
outdoor: 0.12
woodworking: 0.32
Full BoW vs. 80 PCs
80 PCs Full tfidf vector
Validating classifier
auto
metalwork
home
electronic
other
outdoor
woodwork
auto
metalwork
home
electronic
other
outdoor
woodwork
Prediction
Truth
Column normalized
vs. rest ROCWoodworking
Other
Home improvement
With this application, can tolerate false
positives. We can also present users
with more limited options,
maintaining higher accuracy

More Related Content

Viewers also liked

вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...
вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...
вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...viksol
 
Assistive technology
Assistive technologyAssistive technology
Assistive technologyLKHolder
 
вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...
вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...
вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...viksol
 
貸倒引当金
貸倒引当金貸倒引当金
貸倒引当金ichitanaka
 
Plan de Viajes VIP VISION TRAVEL - Cancun
Plan de Viajes VIP VISION TRAVEL - CancunPlan de Viajes VIP VISION TRAVEL - Cancun
Plan de Viajes VIP VISION TRAVEL - CancunElys Santaella
 
Sistema zahodiv metodichnogo_kabinetu_z_vsebichnogo_kompleksnogo_ocinyuvannya
Sistema zahodiv metodichnogo_kabinetu_z_vsebichnogo_kompleksnogo_ocinyuvannyaSistema zahodiv metodichnogo_kabinetu_z_vsebichnogo_kompleksnogo_ocinyuvannya
Sistema zahodiv metodichnogo_kabinetu_z_vsebichnogo_kompleksnogo_ocinyuvannyaviksol
 
Zdorove
ZdoroveZdorove
Zdoroveviksol
 
Stvoennya umov dlya_rozvitku_uspishnoi_osobistosti_v_shkoli_maybutnogo_
Stvoennya umov dlya_rozvitku_uspishnoi_osobistosti_v_shkoli_maybutnogo_Stvoennya umov dlya_rozvitku_uspishnoi_osobistosti_v_shkoli_maybutnogo_
Stvoennya umov dlya_rozvitku_uspishnoi_osobistosti_v_shkoli_maybutnogo_viksol
 
Pedagogichniy dosvid zosh_33
Pedagogichniy dosvid zosh_33Pedagogichniy dosvid zosh_33
Pedagogichniy dosvid zosh_33viksol
 
Shkola spriyannya zdorov_yu
Shkola spriyannya zdorov_yuShkola spriyannya zdorov_yu
Shkola spriyannya zdorov_yuviksol
 
Bagatoprofilna gimnaziya m__krasnoarmiyska_1
Bagatoprofilna gimnaziya m__krasnoarmiyska_1Bagatoprofilna gimnaziya m__krasnoarmiyska_1
Bagatoprofilna gimnaziya m__krasnoarmiyska_1viksol
 
засоби психологічної підтримки
засоби психологічної підтримкизасоби психологічної підтримки
засоби психологічної підтримкиviksol
 
майстер клас павловська с.п.
майстер   клас павловська с.п.майстер   клас павловська с.п.
майстер клас павловська с.п.viksol
 
Contextualised Service Delivery in Internet of Things, Smart Parking for Smar...
Contextualised Service Delivery in Internet of Things, Smart Parking for Smar...Contextualised Service Delivery in Internet of Things, Smart Parking for Smar...
Contextualised Service Delivery in Internet of Things, Smart Parking for Smar...Ali Yavari
 

Viewers also liked (16)

вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...
вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...
вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...
 
Assistive technology
Assistive technologyAssistive technology
Assistive technology
 
Commodity market review1012 iron imf
Commodity market review1012 iron imfCommodity market review1012 iron imf
Commodity market review1012 iron imf
 
вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...
вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...
вебинар психологическое сопровождение_внедрения_инноваций_в_учебно-воспитател...
 
貸倒引当金
貸倒引当金貸倒引当金
貸倒引当金
 
Plan de Viajes VIP VISION TRAVEL - Cancun
Plan de Viajes VIP VISION TRAVEL - CancunPlan de Viajes VIP VISION TRAVEL - Cancun
Plan de Viajes VIP VISION TRAVEL - Cancun
 
Sistema zahodiv metodichnogo_kabinetu_z_vsebichnogo_kompleksnogo_ocinyuvannya
Sistema zahodiv metodichnogo_kabinetu_z_vsebichnogo_kompleksnogo_ocinyuvannyaSistema zahodiv metodichnogo_kabinetu_z_vsebichnogo_kompleksnogo_ocinyuvannya
Sistema zahodiv metodichnogo_kabinetu_z_vsebichnogo_kompleksnogo_ocinyuvannya
 
Zdorove
ZdoroveZdorove
Zdorove
 
Stvoennya umov dlya_rozvitku_uspishnoi_osobistosti_v_shkoli_maybutnogo_
Stvoennya umov dlya_rozvitku_uspishnoi_osobistosti_v_shkoli_maybutnogo_Stvoennya umov dlya_rozvitku_uspishnoi_osobistosti_v_shkoli_maybutnogo_
Stvoennya umov dlya_rozvitku_uspishnoi_osobistosti_v_shkoli_maybutnogo_
 
Pedagogichniy dosvid zosh_33
Pedagogichniy dosvid zosh_33Pedagogichniy dosvid zosh_33
Pedagogichniy dosvid zosh_33
 
Shkola spriyannya zdorov_yu
Shkola spriyannya zdorov_yuShkola spriyannya zdorov_yu
Shkola spriyannya zdorov_yu
 
Bagatoprofilna gimnaziya m__krasnoarmiyska_1
Bagatoprofilna gimnaziya m__krasnoarmiyska_1Bagatoprofilna gimnaziya m__krasnoarmiyska_1
Bagatoprofilna gimnaziya m__krasnoarmiyska_1
 
HPL Collections
HPL CollectionsHPL Collections
HPL Collections
 
засоби психологічної підтримки
засоби психологічної підтримкизасоби психологічної підтримки
засоби психологічної підтримки
 
майстер клас павловська с.п.
майстер   клас павловська с.п.майстер   клас павловська с.п.
майстер клас павловська с.п.
 
Contextualised Service Delivery in Internet of Things, Smart Parking for Smar...
Contextualised Service Delivery in Internet of Things, Smart Parking for Smar...Contextualised Service Delivery in Internet of Things, Smart Parking for Smar...
Contextualised Service Delivery in Internet of Things, Smart Parking for Smar...
 

Similar to Prep Fo/r/ DIY

Argumentation 1 am (week 3)
Argumentation 1 am (week 3)Argumentation 1 am (week 3)
Argumentation 1 am (week 3)Ron Martinez
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jWilliam Lyon
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
Natural Language Processing with Graphs
Natural Language Processing with GraphsNatural Language Processing with Graphs
Natural Language Processing with GraphsNeo4j
 
Literati Platform Training and Usage Workshop
Literati Platform Training and Usage WorkshopLiterati Platform Training and Usage Workshop
Literati Platform Training and Usage WorkshopLOUIS Libraries
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Lucidworks
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdfHabtamu100
 
Books and Webs: Pulling the Down Rows
Books and Webs: Pulling the Down RowsBooks and Webs: Pulling the Down Rows
Books and Webs: Pulling the Down RowsPeter Brantley
 
Short Critical EssayShort Critical Essay ProjectThis project i.docx
Short Critical EssayShort Critical Essay ProjectThis project i.docxShort Critical EssayShort Critical Essay ProjectThis project i.docx
Short Critical EssayShort Critical Essay ProjectThis project i.docxbudabrooks46239
 
Review of Literature
Review of LiteratureReview of Literature
Review of LiteratureCSN Vittal
 

Similar to Prep Fo/r/ DIY (20)

Argumentation 1 am (week 3)
Argumentation 1 am (week 3)Argumentation 1 am (week 3)
Argumentation 1 am (week 3)
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
text
texttext
text
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
Natural Language Processing with Graphs
Natural Language Processing with GraphsNatural Language Processing with Graphs
Natural Language Processing with Graphs
 
IR.pptx
IR.pptxIR.pptx
IR.pptx
 
Business BI
Business BIBusiness BI
Business BI
 
Literati Platform Training and Usage Workshop
Literati Platform Training and Usage WorkshopLiterati Platform Training and Usage Workshop
Literati Platform Training and Usage Workshop
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
MongoDB for Genealogy
MongoDB for GenealogyMongoDB for Genealogy
MongoDB for Genealogy
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdf
 
Books and Webs: Pulling the Down Rows
Books and Webs: Pulling the Down RowsBooks and Webs: Pulling the Down Rows
Books and Webs: Pulling the Down Rows
 
Short Critical EssayShort Critical Essay ProjectThis project i.docx
Short Critical EssayShort Critical Essay ProjectThis project i.docxShort Critical EssayShort Critical Essay ProjectThis project i.docx
Short Critical EssayShort Critical Essay ProjectThis project i.docx
 
Review of Literature
Review of LiteratureReview of Literature
Review of Literature
 

Recently uploaded

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 

Recently uploaded (20)

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Prep Fo/r/ DIY

  • 1. prep fo/r/diy Find relevant content to plan your next hobby or home-improvement Kristofor Nyquist
  • 2. …but reddit is a popularity contest Given that I’m interested in a post 1. What are similar posts I can read? 2. Who are the best people I can talk to? Goal: find related DIY projects
  • 3. Collecting / organizing data • Scraped reddit for do-it-yourself projects – Collected text content from each project i.e. title, externally linked blog post, comments, and general topic
  • 4. Data conversion • Combine all of the text to create one “document” • Treat the document as a list of words • Convert the list of words to a list of numbers – Each number represents the “uniqueness” of a particular word to its document i.e. 0 means word appears in every post 1 means word only appears in that post
  • 5. Data conversion • Combine all of the text to create one “document” • Treat the document as a list of words • Convert the list of words to a list of numbers – Each number represents the “uniqueness” of a particular word to its document i.e. 0 means word appears in every post 1 means word only appears in that post
  • 6. Data conversion • Combine all of the text to create one “document” • Treat the document as a list of words • Convert the list of words to a list of numbers – Each number represents the “uniqueness” of a particular word to its document i.e. 0 means word appears in every post 1 means word only appears in that post
  • 7. Data conversion • Combine all of the text to create one “document” • Treat the document as a list of words • Convert the list of words to a list of numbers – Each number represents the “uniqueness” of a particular word to its document i.e. 0 means word appears in every post 1 means word only appears in that post
  • 8. Data conversion • Combine all of the text to create one “document” • Treat the document as a list of words • Convert the list of words to a list of numbers – Each number represents the “uniqueness” of a particular word to its document i.e. 0 means word appears in every post 1 means word only appears in that post
  • 9. Clustering Text content is rich enough to cluster projects
  • 10. Clustering Classify projects with missing categories based on user inputs
  • 11. Validating post similarity Similar posts group well by general topic. Deviations can make sense
  • 12. Against random Obvious that randomly picking posts is not suitable
  • 13. PhD Biophysics, UC Berkeley BS Physics, WSU About me (Kristofor Nyquist) Hobbies
  • 14. Algorithm Post similarity / Classification • Turn the text into a list of numbers using term-frequency- inverse-document-frequency • “Compress” the data for speed – ~70,000 dimensions to 80 dimensions For similarity: • Calculate cosine-similarity between documents • Present user with 5 most similar posts For classification: • Logistic regression (L1 regularization)
  • 15. Settling on 80 PCs 80 principal components somewhat arbitrary BUT overall accuracy of classifier has definitely converged… even though 80 PCs capture ~30% of variance
  • 16. Validation numbers • NLP algorithm – accuracy: 0.62 recalls: auto: 0.73 electronic: 0.6 home improve: 0.54 metalwork: 0.44 other: 0.52 outdoor: 0.89 woodworking: 0.68 • Random –accuracy: 0.17 recalls: auto: 0.05 electronic: 0.00 home improve: 0.18 metalwork: 0.00 other: 0.08 outdoor: 0.12 woodworking: 0.32
  • 17. Full BoW vs. 80 PCs 80 PCs Full tfidf vector
  • 18. Validating classifier auto metalwork home electronic other outdoor woodwork auto metalwork home electronic other outdoor woodwork Prediction Truth Column normalized vs. rest ROCWoodworking Other Home improvement With this application, can tolerate false positives. We can also present users with more limited options, maintaining higher accuracy