Querylog-based Assessment of Retrievability Bias in a Large Newspaper CorpusMyriam Traub
This is the set of slides for my presentation at JCDL 2016 in Newark, USA.
Bias in the retrieval of documents can directly influence the information access of a digital library. In the worst case, systematic favoritism for a certain type of document can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the retrievability for all documents in a collection. Previous evaluations have been performed on TREC collections using simulated query sets. The question remains, however, how representative this approach is of more realistic settings. To address this question, we investigate the effectiveness of the retrievability measure using a large digitized newspaper corpus, featuring two characteristics that distinguishes our experiments from previous studies: (1) compared to TREC collections, our collection contains noise originating from OCR processing, historical spelling and use of language; and (2) instead of simulated queries, the collection comes with real user query logs including click data.
First, we assess the retrievability bias imposed on the newspaper collection by different IR models. We assess the retrievability measure and confirm its ability to capture the retrievability bias in our setup. Second, we show how simulated queries differ from real user queries regarding term frequency and prevalence of named entities, and how this affects the retrievability results.
Recommendations and Discovery at StumbleUponSumanth Kolar
RecSys 2012 Industry Track - Sumanth Kolar, StumbleUpon
It's human nature to be curious, to learn new things, to want to find out more. Discovery is an innate human need, and with the rise of the Web, the urge to learn more has increased by leaps and bounds. According to David Hornik, investor at August Capital, "The massive scale of the Web not only creates huge challenges for search, it also cripples discovery. Gone are the good old days in which fortuity would lead to the unearthing of interesting new websites." Indeed, we live in the age of "infovores" and there is definitely a need for a service that provides serendipity.
Providing serendipitous discovery that can inform, entertain and enlighten our users is of utmost importance to StumbleUpon. This talk will focus on how StumbleUpon uses several machine learning techniques such as collaborative filtering techniques, active learning, decision trees, Bayesian models and more to solve complex problems involving classification, user behavior analysis, modelling, anti-spam and recommendations. An average StumbleUpon user spends over 7 hours per month using the product, equating to hundreds of varied recommendations and ample feedback. The talk will also provide insights into some of StumbleUpon's rich data and how we can use scale to accomplish what would otherwise not be possible. We will look at innovative ways that StumbleUpon figures out the right metrics to evaluate recommender systems - a very complex problem. We will also discuss our research on StumbleUpon's mobile activity, which is growing 800% year over year and is the fastest growing part of our business, and how mobile recommendations are unique and important.
Bio: As Engineering Director at StumbleUpon, Sumanth Kolar leads the applied research team, overseeing recommendations, anti-spam, content analysis, user modeling, data sciences and infrastructure. ?Sumanth tackles very interesting and challenging research problems as StumbleUpon delivers more than 1 billion personalized recommendations a month to its more than 25 million users. Prior to joining the company in 2009, Sumanth engineered bidding and computer vision systems at Yahoo! and Adobe Research. Sumanth holds a masters degree in computer science from the University of California at Santa Cruz.
Recsys 2016 - Accuracy and Diversity in Cross-domain Recommendations for Cold...Paolo Tomeo
Paper presentation at the 2016 ACM Recommender Systems conference in Boston (MIT).
Computing useful recommendations for cold-start users is a major challenge in the design of recommender systems, and additional data is often required to compensate the scarcity of user feedback. In this paper we address such problem in a target domain by exploiting user preferences from a related auxiliary domain. Following a rigorous methodology for cold-start, we evaluate a number of recommendation methods on a dataset with positive-only feedback in the movie and music domains, both in single and cross-domain scenarios. Comparing the methods in terms of item ranking accuracy, diversity and catalog coverage, we show that cross-domain preference data is useful to provide more accurate suggestions when user feedback in the target domain is scarce or not available at all, and may lead to more diverse recommendations depending on the target domain. Moreover, evaluating the impact of the user profile size and diversity in the source domain, we show that, in general, the quality of target recommendations increases with the size of the profile, but may deteriorate with too diverse profiles.
Classification and Detection of Micro-Level Impact-CSCW2017 (Link: http://dl....R R
Rezapour R, Diesner J (2017) Classification and Detection of Micro-Level Impact of Issue-Focused Films based on Reviews. Proceedings of 20th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2017), Portland, OR.
Personal rankings of educational institutionsAnna Lambrix
This is a UNIBZ research project devoted to personalized rankings and recommendations in the educational domain. Respondent is asked to rate a few universities that are offered based on the geographic region they select. Based on these ratings the system would show three lists of potential universities the respondent might be interested in.
This Master thesis was undertaken by Anna Lambrix with the supervision of the project by Nabil El Ioini and Mehdi Elahi.
In this paper, we propose a model to operationalise serendipity
in content-based recommender systems. The model, called
SIRUP, is inspired by the Silvia’s curiosity theory, based on
the fundamental theory of Berlyne, aims at (1) measuring the
novelty of an item with respect to the user profile, and (2)
assessing whether the user is able to manage such level of
novelty (coping potential). The novelty of items is calculated
with cosine similarities between items, using Linked Open
Data paths. The coping potential of users is estimated by
measuring the diversity of the items in the user profile. We
deployed and evaluated the SIRUP model in a use case with
TV recommender using BBC programs dataset. Results show
that the SIRUP model allows us to identify serendipitous recommendations, and, at the same time, to have 71% precision.
What are the negative effects of social media?: fighting fake informationTomasz Kusmierczyk
The presentation addresses the problem of fake news and reviews in the Internet. In the first part, I present the characteristics of fake information. In the second part, I present the most recent approaches of how to deal with this problem.
Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...University of Geneva
Final PhD Defence presented in March 2016 at the University of Padua, Italy. 3 years PhD under the supervision of Prof. Ombretta Gaggi. Work focused on how it is possible to use smartphone to understand and analyse user behaviour, and how it is possible to use this information to further promote better lifestyle to individuals.
This is a colloquium that I presented on 4/22/21: Stockholm University, Nordic Institute for Theoretical Physics (NORDITA), WINQ–AlbaNova Colloquium
Here is a video of my talk: http://video.albanova.se/ALBANOVA20210422/video.mp4
All classifieds and listings based sites are essentially the same. But as we move to mobile, it's getting harder and harder to make decisions about which hotel to stay at, or which house to rent. How can we satisfy the stop-start nature of mobile usage for these complex decisions, improve personalised recommendations, and create product differentiation? Has Faceted Search reached it's limits? Is there a better way?
Get in touch: https://au.linkedin.com/in/jonharrison2000
Querylog-based Assessment of Retrievability Bias in a Large Newspaper CorpusMyriam Traub
This is the set of slides for my presentation at JCDL 2016 in Newark, USA.
Bias in the retrieval of documents can directly influence the information access of a digital library. In the worst case, systematic favoritism for a certain type of document can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the retrievability for all documents in a collection. Previous evaluations have been performed on TREC collections using simulated query sets. The question remains, however, how representative this approach is of more realistic settings. To address this question, we investigate the effectiveness of the retrievability measure using a large digitized newspaper corpus, featuring two characteristics that distinguishes our experiments from previous studies: (1) compared to TREC collections, our collection contains noise originating from OCR processing, historical spelling and use of language; and (2) instead of simulated queries, the collection comes with real user query logs including click data.
First, we assess the retrievability bias imposed on the newspaper collection by different IR models. We assess the retrievability measure and confirm its ability to capture the retrievability bias in our setup. Second, we show how simulated queries differ from real user queries regarding term frequency and prevalence of named entities, and how this affects the retrievability results.
Recommendations and Discovery at StumbleUponSumanth Kolar
RecSys 2012 Industry Track - Sumanth Kolar, StumbleUpon
It's human nature to be curious, to learn new things, to want to find out more. Discovery is an innate human need, and with the rise of the Web, the urge to learn more has increased by leaps and bounds. According to David Hornik, investor at August Capital, "The massive scale of the Web not only creates huge challenges for search, it also cripples discovery. Gone are the good old days in which fortuity would lead to the unearthing of interesting new websites." Indeed, we live in the age of "infovores" and there is definitely a need for a service that provides serendipity.
Providing serendipitous discovery that can inform, entertain and enlighten our users is of utmost importance to StumbleUpon. This talk will focus on how StumbleUpon uses several machine learning techniques such as collaborative filtering techniques, active learning, decision trees, Bayesian models and more to solve complex problems involving classification, user behavior analysis, modelling, anti-spam and recommendations. An average StumbleUpon user spends over 7 hours per month using the product, equating to hundreds of varied recommendations and ample feedback. The talk will also provide insights into some of StumbleUpon's rich data and how we can use scale to accomplish what would otherwise not be possible. We will look at innovative ways that StumbleUpon figures out the right metrics to evaluate recommender systems - a very complex problem. We will also discuss our research on StumbleUpon's mobile activity, which is growing 800% year over year and is the fastest growing part of our business, and how mobile recommendations are unique and important.
Bio: As Engineering Director at StumbleUpon, Sumanth Kolar leads the applied research team, overseeing recommendations, anti-spam, content analysis, user modeling, data sciences and infrastructure. ?Sumanth tackles very interesting and challenging research problems as StumbleUpon delivers more than 1 billion personalized recommendations a month to its more than 25 million users. Prior to joining the company in 2009, Sumanth engineered bidding and computer vision systems at Yahoo! and Adobe Research. Sumanth holds a masters degree in computer science from the University of California at Santa Cruz.
Recsys 2016 - Accuracy and Diversity in Cross-domain Recommendations for Cold...Paolo Tomeo
Paper presentation at the 2016 ACM Recommender Systems conference in Boston (MIT).
Computing useful recommendations for cold-start users is a major challenge in the design of recommender systems, and additional data is often required to compensate the scarcity of user feedback. In this paper we address such problem in a target domain by exploiting user preferences from a related auxiliary domain. Following a rigorous methodology for cold-start, we evaluate a number of recommendation methods on a dataset with positive-only feedback in the movie and music domains, both in single and cross-domain scenarios. Comparing the methods in terms of item ranking accuracy, diversity and catalog coverage, we show that cross-domain preference data is useful to provide more accurate suggestions when user feedback in the target domain is scarce or not available at all, and may lead to more diverse recommendations depending on the target domain. Moreover, evaluating the impact of the user profile size and diversity in the source domain, we show that, in general, the quality of target recommendations increases with the size of the profile, but may deteriorate with too diverse profiles.
Classification and Detection of Micro-Level Impact-CSCW2017 (Link: http://dl....R R
Rezapour R, Diesner J (2017) Classification and Detection of Micro-Level Impact of Issue-Focused Films based on Reviews. Proceedings of 20th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2017), Portland, OR.
Personal rankings of educational institutionsAnna Lambrix
This is a UNIBZ research project devoted to personalized rankings and recommendations in the educational domain. Respondent is asked to rate a few universities that are offered based on the geographic region they select. Based on these ratings the system would show three lists of potential universities the respondent might be interested in.
This Master thesis was undertaken by Anna Lambrix with the supervision of the project by Nabil El Ioini and Mehdi Elahi.
In this paper, we propose a model to operationalise serendipity
in content-based recommender systems. The model, called
SIRUP, is inspired by the Silvia’s curiosity theory, based on
the fundamental theory of Berlyne, aims at (1) measuring the
novelty of an item with respect to the user profile, and (2)
assessing whether the user is able to manage such level of
novelty (coping potential). The novelty of items is calculated
with cosine similarities between items, using Linked Open
Data paths. The coping potential of users is estimated by
measuring the diversity of the items in the user profile. We
deployed and evaluated the SIRUP model in a use case with
TV recommender using BBC programs dataset. Results show
that the SIRUP model allows us to identify serendipitous recommendations, and, at the same time, to have 71% precision.
What are the negative effects of social media?: fighting fake informationTomasz Kusmierczyk
The presentation addresses the problem of fake news and reviews in the Internet. In the first part, I present the characteristics of fake information. In the second part, I present the most recent approaches of how to deal with this problem.
Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...University of Geneva
Final PhD Defence presented in March 2016 at the University of Padua, Italy. 3 years PhD under the supervision of Prof. Ombretta Gaggi. Work focused on how it is possible to use smartphone to understand and analyse user behaviour, and how it is possible to use this information to further promote better lifestyle to individuals.
This is a colloquium that I presented on 4/22/21: Stockholm University, Nordic Institute for Theoretical Physics (NORDITA), WINQ–AlbaNova Colloquium
Here is a video of my talk: http://video.albanova.se/ALBANOVA20210422/video.mp4
All classifieds and listings based sites are essentially the same. But as we move to mobile, it's getting harder and harder to make decisions about which hotel to stay at, or which house to rent. How can we satisfy the stop-start nature of mobile usage for these complex decisions, improve personalised recommendations, and create product differentiation? Has Faceted Search reached it's limits? Is there a better way?
Get in touch: https://au.linkedin.com/in/jonharrison2000
Similar to Temporal Diversity in RecSys - SIGIR2010 (20)
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
How world-class product teams are winning in the AI era by CEO and Founder, P...
Temporal Diversity in RecSys - SIGIR2010
1. Temporal Diversity in Recommender Systems
Neal Lathia1, Stephen Hailes1, Licia Capra1, Xavier Amatriain2
1
Dept. Computer Science, University College London
2
Telefonica Research, Barcelona
ACM SIGIR 2010, Geneva
n.lathia@cs.ucl.ac.uk
@neal_lathia, @xamat
EU i-Tour Project
2. recommender systems
● many examples over different web domains
●
a lot of research: accuracy
● multiple dimensions of usage that equate to user
satisfaction
3. evaluating collaborative filtering over time
● design a methodology to evaluate recommender systems
that are iteratively updated; explore temporal dimension
of filtering algorithms1
1
N. Lathia, S. Hailes, L. Capra. Temporal Collaborative Filtering with
Adaptive Neighbourhoods. ACM SIGIR 2009, Boston, USA
4. temporal diversity
● ...is not concerned with diversity of a single set of
recommendations (e.g., are you recommended all six star
wars movies at once?)
● ...is concerned with the sequence of recommendations
that users see (are you recommended the same items
every week?)
5. contributions
● is temporal recommendation diversity important?
● how to measure temporal diversity and novelty?
● how much temporal diversity do state-of-the-art CF
algorithms provide?
● how to improve temporal diversity?
18. Closing Questions
surprise, unrest, rude
compliments, “spot on”
74% important / very important
23% neutral
86% important / very important
95% important / very important
21. how did this affect the way people rated?
S3 Random: Always Bad
22. how did this affect the way people rated?
S2 Popular: Quite Good
S3 Random: Always Bad
23. how did this affect the way people rated?
S2 Popular: Quite Good
S1 Starts off Quite Good
S1 Ends off Bad
S3 Random: Always Bad
...ANOVA details in paper...
33. main results
● as profile size increases, diversity decreases
● the more ratings added in the current session, the more
diversity will be experienced in next session
● more time between sessions leads to more diversity
34. consequences
● want to avoid from having profiles that are too large
● (conflict #1) want to encourage users to rate as much as
possible
● (conflict #2) want users to visit often, but diversity
increases if they don't
● how does this relate back to traditional evaluation metrics?
42. contributions/summary
● temporal diversity is important
● defined (simple, extendable) metric to measure temporal
recommendation diversity
● analysed factors that influence diversity; most accurate
algorithm is not the most diverse
● hybrid-switching/re-ranking can improve diversity
43. Temporal Diversity in Recommender Systems
Neal Lathia1, Stephen Hailes1, Licia Capra1, Xavier Amatriain2
1
Dept. Computer Science, University College London
2
Telefonica Research, Barcelona
ACM SIGIR 2010, Geneva
n.lathia@cs.ucl.ac.uk
@neal_lathia, @xamat
Support by:
EU FP7 i-Tour
Grant 234239