SlideShare a Scribd company logo
Neighborhood Troubles:
On the Value of User Pre-Filtering To Speed Up and Enhance
Recommendations
Emanuel Lacic, Dominik Kowald & Elisabeth Lex
Social Computing
Know-Center GmbH / Graz University of Technology
Motivation
RecSys in the Big Dataera:
• Need to analyze a lot of data
– With many different data features and recommendable entities
• Handle frequent streams of new data
• Demand for real-time performance
User-based Collaborative Filtering
• UB-CF is (still) one of the most commonly utilized approaches
in both industry and academia
• Accomplished in two steps (both parallelizable):
– Determine k-nearest neighbors (similar users) for a target user 𝑢 𝑡
– Recommend entities of these users that the target user 𝑢 𝑡
has not yet consumed
User-based Collaborative Filtering
• UB-CF is (still) one of the most commonly utilized approaches
in both industry and academia
• Accomplished in two steps (both parallelizable):
– Determine k-nearest neighbors (similar users) for a target user 𝑢 𝑡
– Recommend entities of these users that the target user 𝑢 𝑡
has not yet consumed
• A lot of data? Parallelize and scale UB-CF?
– What if it’s not desirable or possible to
allocate additional computing resources?
Bottleneck: Neighbor processing
• We need to fetch the history of every neighbor and calculate the
similarity with the target user 𝑢 𝑡
• Similarities need to be sorted in order to pick the top-k similar users
– Common implementations of such operations have a
complexity of 𝑂 𝑛 × log 𝑛
– Could also be improved to a complexity of 𝑂 𝑛 + 𝑘 × log 𝑛
by additionally implementing a partial sort
Bottleneck: Neighbor processing
The larger the neighborhood of 𝑢 𝑡 is, the larger the impact
on the runtime performance could be!
Adaptation: User Pre-Filtering
• Adapt the 1st step of UB-CF by pre-filtering the candidate set
of possible similar users
• Greedy strategy
– Find the top-N candidate users which have the highest overlap
of entities that were interacted with
• Aim to positively influence the runtime performance of those users,
which exhibit many neighbors
– Increase the probability that users with a high overlap will in the end
be picked as the top-k similar users
𝑂𝑉(𝑢 𝑡, 𝑢 𝑐) = ∆(𝑢 𝑡) ∩ ∆(𝑢 𝑐)
Implementation details: ScaR
• Scalable entity recommender
framework that leverages the
Apache Solr search engine
http://scar.know-center.tugraz.at
Implementation details: ScaR
• Data Modification Layer
– Agent between the framework
and Apache Solr
• Recommender Customizer
– Configuration depending on
the entity recommendation
scenario (e.g., similarity
metric definition)
• Recommender Engine
𝑝𝑟𝑒𝑑 𝑢 𝑡, 𝑒 = 𝑠𝑖𝑚(𝑢 𝑡, 𝑢 𝑐)
𝑢 𝑐∈𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠(𝑢 𝑡,𝑒)
Implementation details: ScaR
• Need Efficient User Pre-Filtering
• Utilize Solr’s facet functionality
– Arrangement of results into
categories
– Possible to restrict with filters
• Reduce the search space and
exactly get the desired 𝑂𝑉(𝑢 𝑡, 𝑢 𝑐)
– Facet on users
– Filter on entities of target user 𝑢 𝑡
Experimental Setup
• Foursquare Dataset
– Sparse with a right-tailed distribution that
has sharper peaks and broader tails
• Neighborhood size
– Average: 764
– Median: 4
– Maximum: 125,046
• Evaluation on all users that rated at
least 11 venues
– 58,046 users in total
– Withheld 10 rated venues from the dataset to be
predicted in the test set
– Evaluate on ranking accuracy (Precision, Recall, nDCG) and runtime
Measure Value
# Users 2,153,471
# Entities 2,809,581
# Ratings 1,143,092
Density 0.000015
Skewness 227.53
Kurtosis 62,844.43
Experimental Setup
• Hypothesis: the same interval of values for top-k similar users is valid
for a greedy pick of candidate neighbors (i.e., between 20 and 60)
Approach Description
Most Popular A baseline that recommends most popular items
𝐶𝐹𝐹𝑢𝑙𝑙 UB-CF which calculates similarities for all neighbors
𝐶𝐹𝑂𝑉=20 Greedy pick of top-20 overlapping users
𝐶𝐹𝑂𝑉=40 Greedy pick of top-40 overlapping users
𝐶𝐹𝑂𝑉=60 Greedy pick of top-60 overlapping users
𝐶𝐹𝑂𝑉=80 Greedy pick of top-80 overlapping users
𝐶𝐹𝑂𝑉=100 Greedy pick of top-100 overlapping users
Results
Results
Approach 𝑇 (𝑚𝑠) 𝜎 (𝑚𝑠) P@10 R@10 nDCG@10 UC
Most Popular 78.59 20.00 .0285 .0285 .0232 100 %
CollaborativeFiltering
𝐶𝐹𝐹𝑢𝑙𝑙 2,053.45 9,600.63 .0611 .0527 .0316 66.56 %
𝐶𝐹𝑂𝑉=20 59.56 60.08 .0586 .0541 .0318 65.87 %
𝐶𝐹𝑂𝑉=40 65.47 69.61 .0689 .0645 .0378 66.21 %
𝐶𝐹𝑂𝑉=60 74.62 85.83 .0724 .0678 .0396 66.10 %
𝐶𝐹𝑂𝑉=80 82.40 102.75 .0707 .0661 .0386 65.62 %
𝐶𝐹𝑂𝑉=100 87.38 115.17 .0693 .0646 .0373 65.70 %
Conclusion and Future Work
• Integrating User Pre-Filtering can speed up and enhance
recommendations
• Depending on the data, it can also increase the overall accuracy
– Potentially can help to reduce noise out of the candidate entities
• Need to validate results in a more comprehensive study
using data with different types of entities
• Plan to validate our approach in course of an online study
within the Analytics for Everyday Learning (AFEL) project
– See our AFEL-REC paper in the SIR workshop @ CIKM
Questions / suggestions ?
Emanuel Lacić
elacic@know-center.at
Social Computing / Know-Center

More Related Content

Similar to [AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up and Enhance Recommendations

RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
Khadija Atiya
 
Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)
Tamas Jambor
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
Nish Parikh
 

Similar to [AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up and Enhance Recommendations (20)

RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in Recommendation
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
 
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & TasksParts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
 
User Behavior Hashing for Audience Expansion
User Behavior Hashing for Audience ExpansionUser Behavior Hashing for Audience Expansion
User Behavior Hashing for Audience Expansion
 
Recommender system
Recommender systemRecommender system
Recommender system
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
 

Recently uploaded

Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Sérgio Sacani
 
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
PirithiRaju
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
Sérgio Sacani
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
THYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursingTHYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursing
Jocelyn Atis
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
Sérgio Sacani
 

Recently uploaded (20)

NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
 
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdfPests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
Pests of Green Manures_Bionomics_IPM_Dr.UPR.pdf
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)
 
A Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthA Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on Earth
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
biotech-regenration of plants, pharmaceutical applications.pptx
biotech-regenration of plants, pharmaceutical applications.pptxbiotech-regenration of plants, pharmaceutical applications.pptx
biotech-regenration of plants, pharmaceutical applications.pptx
 
National Biodiversity protection initiatives and Convention on Biological Di...
National Biodiversity protection initiatives and  Convention on Biological Di...National Biodiversity protection initiatives and  Convention on Biological Di...
National Biodiversity protection initiatives and Convention on Biological Di...
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
THYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursingTHYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursing
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
 

[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up and Enhance Recommendations

  • 1. Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up and Enhance Recommendations Emanuel Lacic, Dominik Kowald & Elisabeth Lex Social Computing Know-Center GmbH / Graz University of Technology
  • 2. Motivation RecSys in the Big Dataera: • Need to analyze a lot of data – With many different data features and recommendable entities • Handle frequent streams of new data • Demand for real-time performance
  • 3. User-based Collaborative Filtering • UB-CF is (still) one of the most commonly utilized approaches in both industry and academia • Accomplished in two steps (both parallelizable): – Determine k-nearest neighbors (similar users) for a target user 𝑢 𝑡 – Recommend entities of these users that the target user 𝑢 𝑡 has not yet consumed
  • 4. User-based Collaborative Filtering • UB-CF is (still) one of the most commonly utilized approaches in both industry and academia • Accomplished in two steps (both parallelizable): – Determine k-nearest neighbors (similar users) for a target user 𝑢 𝑡 – Recommend entities of these users that the target user 𝑢 𝑡 has not yet consumed • A lot of data? Parallelize and scale UB-CF? – What if it’s not desirable or possible to allocate additional computing resources?
  • 5. Bottleneck: Neighbor processing • We need to fetch the history of every neighbor and calculate the similarity with the target user 𝑢 𝑡 • Similarities need to be sorted in order to pick the top-k similar users – Common implementations of such operations have a complexity of 𝑂 𝑛 × log 𝑛 – Could also be improved to a complexity of 𝑂 𝑛 + 𝑘 × log 𝑛 by additionally implementing a partial sort
  • 6. Bottleneck: Neighbor processing The larger the neighborhood of 𝑢 𝑡 is, the larger the impact on the runtime performance could be!
  • 7. Adaptation: User Pre-Filtering • Adapt the 1st step of UB-CF by pre-filtering the candidate set of possible similar users • Greedy strategy – Find the top-N candidate users which have the highest overlap of entities that were interacted with • Aim to positively influence the runtime performance of those users, which exhibit many neighbors – Increase the probability that users with a high overlap will in the end be picked as the top-k similar users 𝑂𝑉(𝑢 𝑡, 𝑢 𝑐) = ∆(𝑢 𝑡) ∩ ∆(𝑢 𝑐)
  • 8. Implementation details: ScaR • Scalable entity recommender framework that leverages the Apache Solr search engine http://scar.know-center.tugraz.at
  • 9. Implementation details: ScaR • Data Modification Layer – Agent between the framework and Apache Solr • Recommender Customizer – Configuration depending on the entity recommendation scenario (e.g., similarity metric definition) • Recommender Engine 𝑝𝑟𝑒𝑑 𝑢 𝑡, 𝑒 = 𝑠𝑖𝑚(𝑢 𝑡, 𝑢 𝑐) 𝑢 𝑐∈𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠(𝑢 𝑡,𝑒)
  • 10. Implementation details: ScaR • Need Efficient User Pre-Filtering • Utilize Solr’s facet functionality – Arrangement of results into categories – Possible to restrict with filters • Reduce the search space and exactly get the desired 𝑂𝑉(𝑢 𝑡, 𝑢 𝑐) – Facet on users – Filter on entities of target user 𝑢 𝑡
  • 11. Experimental Setup • Foursquare Dataset – Sparse with a right-tailed distribution that has sharper peaks and broader tails • Neighborhood size – Average: 764 – Median: 4 – Maximum: 125,046 • Evaluation on all users that rated at least 11 venues – 58,046 users in total – Withheld 10 rated venues from the dataset to be predicted in the test set – Evaluate on ranking accuracy (Precision, Recall, nDCG) and runtime Measure Value # Users 2,153,471 # Entities 2,809,581 # Ratings 1,143,092 Density 0.000015 Skewness 227.53 Kurtosis 62,844.43
  • 12. Experimental Setup • Hypothesis: the same interval of values for top-k similar users is valid for a greedy pick of candidate neighbors (i.e., between 20 and 60) Approach Description Most Popular A baseline that recommends most popular items 𝐶𝐹𝐹𝑢𝑙𝑙 UB-CF which calculates similarities for all neighbors 𝐶𝐹𝑂𝑉=20 Greedy pick of top-20 overlapping users 𝐶𝐹𝑂𝑉=40 Greedy pick of top-40 overlapping users 𝐶𝐹𝑂𝑉=60 Greedy pick of top-60 overlapping users 𝐶𝐹𝑂𝑉=80 Greedy pick of top-80 overlapping users 𝐶𝐹𝑂𝑉=100 Greedy pick of top-100 overlapping users
  • 14. Results Approach 𝑇 (𝑚𝑠) 𝜎 (𝑚𝑠) P@10 R@10 nDCG@10 UC Most Popular 78.59 20.00 .0285 .0285 .0232 100 % CollaborativeFiltering 𝐶𝐹𝐹𝑢𝑙𝑙 2,053.45 9,600.63 .0611 .0527 .0316 66.56 % 𝐶𝐹𝑂𝑉=20 59.56 60.08 .0586 .0541 .0318 65.87 % 𝐶𝐹𝑂𝑉=40 65.47 69.61 .0689 .0645 .0378 66.21 % 𝐶𝐹𝑂𝑉=60 74.62 85.83 .0724 .0678 .0396 66.10 % 𝐶𝐹𝑂𝑉=80 82.40 102.75 .0707 .0661 .0386 65.62 % 𝐶𝐹𝑂𝑉=100 87.38 115.17 .0693 .0646 .0373 65.70 %
  • 15. Conclusion and Future Work • Integrating User Pre-Filtering can speed up and enhance recommendations • Depending on the data, it can also increase the overall accuracy – Potentially can help to reduce noise out of the candidate entities • Need to validate results in a more comprehensive study using data with different types of entities • Plan to validate our approach in course of an online study within the Analytics for Everyday Learning (AFEL) project – See our AFEL-REC paper in the SIR workshop @ CIKM
  • 16. Questions / suggestions ? Emanuel Lacić elacic@know-center.at Social Computing / Know-Center