Slidedeck of my lecture at SIKS Course "Advances in Information Retrieval"
Read more here: https://graus.nu/blog/bias-in-recommendations-lecture-siks-course-on-advances-in-ir/
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.David Graus
The document summarizes research on recommender systems in the media industry. It discusses how FD Mediagroup uses recommender systems for their SMART Radio and SMART Journalism products. Key aspects of building a recommender system that FD focuses on include relevance, usefulness, and trust. Relevance is evaluated using metrics like NDCG, MAP, and R-Precision. Usefulness considers both algorithmic goals like diversity and business goals. Trust is evaluated based on whether users engage with the recommender system.
Are you looking for tools to help you run your business that won’t break the bank? This fast-paced session is for you! Learn about free and low-cost tools for productivity, marketing, communications and basic office functions. We’ll cover what the tools can do for you and where to get them. Don’t miss this opportunity to explore new ways to solve common problems with uncommon tools. Assumes a working knowledge of web browsers. Primarily for PC users, although Mac availability will be covered where possible.
The document outlines the entrepreneurship programs and events hosted by McGill University's Dobson Centre for Entrepreneurship between September 2018 and June 2019. These include various startup competitions, workshops on topics such as pitching, financials and mental health in startups, and accelerator programs. Resources available through McGill Library to support entrepreneurship research are also listed, such as industry reports, market data, and assistance from a liaison librarian.
Talk given at Griffith University in Australia on trends in Research Data Management, FAIR and current progress towards this in the European Open Science Cloud
Overview of personal blogs and websites in radiology and how they may be used by radiologists to help patients and professionals. Presented at the European Congress of Radiology on 27th February 2019 by Dr Christopher Clarke.
Link to full presentation: https://ecronline.myesr.org/ecr2019/index.php?p=recording&t=recorded&lecture=personal-blogs-and-websites-in-radiology
This document provides an orientation for PhD students at McGill University's Department of Integrated Studies in Education. It includes information on student groups and events, useful links, the DISE doctoral program, funding opportunities, and advice. Key details about the student-faculty relationship and responsibilities are outlined. Essential resources for success at McGill such as services, funding, careers support and academic regulations are also summarized.
GraphTalk Copenhagen - Killing Data Silos in the Life Sciences with Neo4jNeo4j
This document discusses how clinical research data is often siloed across different systems and standards, making it difficult to integrate and analyze. It proposes using Neo4j, a graph database, to link clinical research data and overcome these silos. Key benefits include being able to trace data for regulatory purposes, maintain a single source of truth, and reduce redundant copies across different systems and standards. The document provides examples of how Neo4j could be used for a study workbench, integrating electronic health records, and mining clinical definitions.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.David Graus
The document summarizes research on recommender systems in the media industry. It discusses how FD Mediagroup uses recommender systems for their SMART Radio and SMART Journalism products. Key aspects of building a recommender system that FD focuses on include relevance, usefulness, and trust. Relevance is evaluated using metrics like NDCG, MAP, and R-Precision. Usefulness considers both algorithmic goals like diversity and business goals. Trust is evaluated based on whether users engage with the recommender system.
Are you looking for tools to help you run your business that won’t break the bank? This fast-paced session is for you! Learn about free and low-cost tools for productivity, marketing, communications and basic office functions. We’ll cover what the tools can do for you and where to get them. Don’t miss this opportunity to explore new ways to solve common problems with uncommon tools. Assumes a working knowledge of web browsers. Primarily for PC users, although Mac availability will be covered where possible.
The document outlines the entrepreneurship programs and events hosted by McGill University's Dobson Centre for Entrepreneurship between September 2018 and June 2019. These include various startup competitions, workshops on topics such as pitching, financials and mental health in startups, and accelerator programs. Resources available through McGill Library to support entrepreneurship research are also listed, such as industry reports, market data, and assistance from a liaison librarian.
Talk given at Griffith University in Australia on trends in Research Data Management, FAIR and current progress towards this in the European Open Science Cloud
Overview of personal blogs and websites in radiology and how they may be used by radiologists to help patients and professionals. Presented at the European Congress of Radiology on 27th February 2019 by Dr Christopher Clarke.
Link to full presentation: https://ecronline.myesr.org/ecr2019/index.php?p=recording&t=recorded&lecture=personal-blogs-and-websites-in-radiology
This document provides an orientation for PhD students at McGill University's Department of Integrated Studies in Education. It includes information on student groups and events, useful links, the DISE doctoral program, funding opportunities, and advice. Key details about the student-faculty relationship and responsibilities are outlined. Essential resources for success at McGill such as services, funding, careers support and academic regulations are also summarized.
GraphTalk Copenhagen - Killing Data Silos in the Life Sciences with Neo4jNeo4j
This document discusses how clinical research data is often siloed across different systems and standards, making it difficult to integrate and analyze. It proposes using Neo4j, a graph database, to link clinical research data and overcome these silos. Key benefits include being able to trace data for regulatory purposes, maintain a single source of truth, and reduce redundant copies across different systems and standards. The document provides examples of how Neo4j could be used for a study workbench, integrating electronic health records, and mining clinical definitions.
The MILab at Texas State University launched in 2016 to support teaching and research in emerging digital media. It provides resources like a student lab and maker space. MILab is organized by areas like VR/AR, social media analytics, multimedia, UX design, and coding. It hosts events like a digital media speaker series and has received grants for projects from the Knight Foundation and ProPublica. Enrollment in the Digital Media program has grown to over 200 students representing diverse backgrounds. Graduates have found internships and jobs in social media agencies and many pursue further education.
What makes it worth becoming a Data Engineer?Hadi Fadlallah
This presentation explains what data engineering is for non-computer science students and why it is worth being a data engineer. I used this presentation while working as an on-demand instructor at Nooreed.com
"Blue Commons" - Shared Cultural Value of Water & Public SpaceCarter Craft
presentation at the "Reclaiming the Estuary" event on March 9, 2017 hosted by Prof Sarah Durand, Laguardia Community College, and Willis Elkins, Newtown Creek Alliance. Presentation by Carter Craft, Sr. Economic Officer, Consulate General of the Netherlands in NYC
Master of Public Administration (MPA) Academic Plan of Deepak (Danny) Singh a...Danny Singh, M.B.A., MSEd
The academic plan verifies I completed all the course requirements for the Master of Public Administration degree at Excelsior College #Excelsiorproud :)
This document summarizes a presentation about the power of listening in social media. It includes 3 case studies of government agencies that have implemented social media monitoring tools. It discusses the value of listening before engaging, including identifying influencers, patterns, sentiment and keywords. It also outlines different types of social media channels and metrics that can be used, such as likes, shares, retweets and comments. The document concludes with an agenda item for a workshop on developing goals and best practices for social media monitoring.
Digital experience insights - through the eyes of students and staffJisc
The document discusses Jisc's digital experience insights service which helps educational institutions understand how students and staff experience their digital environments. It provides case studies of three institutions - University of Stirling, City of Wolverhampton College, and Canterbury Christ Church University - that used the service and saw benefits such as improvements to WiFi coverage, website usability, and student support services. The service helps institutions gather feedback on digital experiences and identify areas for enhancement.
The Digital Curation Centre (DCC) provides services to help organizations develop their research data management strategies and practices. The DCC assesses needs, provides advocacy support, pilots tools, develops guidance and training, and assists with creating customized data management plans and policies. It offers resources like data audits, data management planning tools, risk assessment methods, and training courses. The DCC works with institutions to help strengthen their research data management capabilities.
The document is a summary of a social media training session hosted by Christina Krause and Kevin Smith at the IHI National Forum on Quality Improvement in Health Care in December 2013.
The training covered using social media as a strategy for healthcare improvement and getting hands-on experience with Twitter. Attendees participated in networking activities and exercises to brainstorm how social media could help or hinder healthcare goals.
The session reviewed best practices for social media engagement including developing clear strategies, guidelines, and content like stories, presentations, photos and videos to share on platforms like Facebook, Twitter, and email newsletters. Metrics like open and click rates were discussed to measure effectiveness.
Plans for the University of Virginia School of Data ScienceMelissa Moody
This document proposes establishing a School of Data Science (SDS) at the University of Virginia. It was prepared by the Data Science Institute (DSI) to justify expanding the DSI into the SDS.
The document defines data science and proposes a mission for the SDS to train a diverse data science workforce, undertake leading-edge interdisciplinary research, and maximize societal benefit through data science. It outlines goals for the SDS's education, research, workforce development, and community engagement.
The document argues that a School is needed due to growing demand for data science education and employment opportunities. It describes how the SDS would build on the success of the existing DSI programs while strengthening collaborations across UVA schools and departments
DASA Innovation Partner, Tony Collins, discusses International Outreach.
DASA Senior Exploitation Manager, Eleanor Rice, discusses exploitation of innovation.
DASA Access to Mentoring and Finance Lead, Alan Scrase, discusses how his support will add value
ACE Advising Research Workshop Series 5: Creating a Research Proposalmgabra18
This document outlines topics to be covered in a workshop on writing research proposals, including: an introduction to research proposals and what they are used for; distinguishing between research design and methods; determining the scope of a research project; and planning and managing the project. The workshop aims to help attendees learn how to identify the structure of a research proposal, select appropriate research designs and methods, articulate the needs and timeline of a project, and maintain motivation. Attendees will also learn about obtaining IRB approval and preparing for uncertainties.
Presentation by Lisa Federer (UCLA) on 16 July 2013 as part of the IMLS-sponsored DMPTool Webinar Series.
Description: This webinar will discuss the special needs of health sciences researchers and help you learn how to talk to researchers in the health and medical fields about their data management needs. We will cover NIH Data Sharing Policy and how to write a data management plan that meets NIH’s requirements. After viewing this webinar, participants will understand: who is required to submit a plan; specific information that should be included in a plan; how to use the DMPTool to write an NIH-specific DMP; and where to find additional resources for help.
These are the slides from the Work Smarter Together event run on 23 October 2019.
If you download them you'll get to see the slide transitions and speaker notes which do not show in SlideShare (as least not that I know how to make it happen).
Michael
A project is a planned set of activities with defined start and end dates undertaken to achieve specific objectives within constraints of time, cost, quality and resources. Projects vary in size and complexity but all follow a typical life cycle of starting, organizing, carrying out work, and closing. Project management is the application of knowledge, skills, tools and techniques to project activities to meet project requirements. It involves planning, organizing, monitoring and controlling project resources to produce project deliverables within defined scope, quality, time and cost constraints. Project management aims to achieve the project objectives and obtain benefits that fulfill organizational strategies.
The University of Sydney has developed tools to help researchers with research data management planning. They created a research data management planning tool in 2014 called ReDBox to help researchers capture data management information and request access to storage and computing resources. Over time, they continued improving the tool based on researcher feedback. The future involves continuing to refine the tool based on reviews of the research data management landscape and users' experiences.
This document provides an introduction to research data management for geoscience PhD students. It defines research data and different data types. It discusses the importance of managing data throughout its lifecycle for efficient and valid research. It outlines funder requirements, university policies, and activities involved in good research data management like data planning, documentation, storage, sharing and preservation.
Don't Mention the G Word - How the University of Sheffield got GoogledAndy Tattersall
- The University of Sheffield is moving over 25,000 students and 6,500 staff to Google Apps for Education to meet growing demands for data storage, access, communication tools, and social platforms.
- Their existing student email and file storage systems are outdated and inflexible, while staff are using external tools like Dropbox.
- The transition will start with students and then move to staff. Training and support resources like workshops, online tutorials, and documentation are being provided.
- Successfully adopting new technologies requires addressing issues like changing habits and tip-of-the-iceberg costs beyond initial implementation. The benefits of Google Apps include low costs and flexibility to meet the University's evolving needs.
Pragmatic ethical and fair AI for data scientistsDavid Graus
1. David Graus presented on pragmatic and fair AI for recruitment and news recommendations.
2. He discussed how algorithms can unintentionally learn and reflect human biases around gender and race. However, AI may also help address these biases, such as through representational ranking in recruitment to achieve demographic parity.
3. Graus also explored using editorial values like diversity, dynamism and serendipity to guide news recommendations, and found their system could increase dynamism without loss of accuracy through constrained intervention.
The MILab at Texas State University launched in 2016 to support teaching and research in emerging digital media. It provides resources like a student lab and maker space. MILab is organized by areas like VR/AR, social media analytics, multimedia, UX design, and coding. It hosts events like a digital media speaker series and has received grants for projects from the Knight Foundation and ProPublica. Enrollment in the Digital Media program has grown to over 200 students representing diverse backgrounds. Graduates have found internships and jobs in social media agencies and many pursue further education.
What makes it worth becoming a Data Engineer?Hadi Fadlallah
This presentation explains what data engineering is for non-computer science students and why it is worth being a data engineer. I used this presentation while working as an on-demand instructor at Nooreed.com
"Blue Commons" - Shared Cultural Value of Water & Public SpaceCarter Craft
presentation at the "Reclaiming the Estuary" event on March 9, 2017 hosted by Prof Sarah Durand, Laguardia Community College, and Willis Elkins, Newtown Creek Alliance. Presentation by Carter Craft, Sr. Economic Officer, Consulate General of the Netherlands in NYC
Master of Public Administration (MPA) Academic Plan of Deepak (Danny) Singh a...Danny Singh, M.B.A., MSEd
The academic plan verifies I completed all the course requirements for the Master of Public Administration degree at Excelsior College #Excelsiorproud :)
This document summarizes a presentation about the power of listening in social media. It includes 3 case studies of government agencies that have implemented social media monitoring tools. It discusses the value of listening before engaging, including identifying influencers, patterns, sentiment and keywords. It also outlines different types of social media channels and metrics that can be used, such as likes, shares, retweets and comments. The document concludes with an agenda item for a workshop on developing goals and best practices for social media monitoring.
Digital experience insights - through the eyes of students and staffJisc
The document discusses Jisc's digital experience insights service which helps educational institutions understand how students and staff experience their digital environments. It provides case studies of three institutions - University of Stirling, City of Wolverhampton College, and Canterbury Christ Church University - that used the service and saw benefits such as improvements to WiFi coverage, website usability, and student support services. The service helps institutions gather feedback on digital experiences and identify areas for enhancement.
The Digital Curation Centre (DCC) provides services to help organizations develop their research data management strategies and practices. The DCC assesses needs, provides advocacy support, pilots tools, develops guidance and training, and assists with creating customized data management plans and policies. It offers resources like data audits, data management planning tools, risk assessment methods, and training courses. The DCC works with institutions to help strengthen their research data management capabilities.
The document is a summary of a social media training session hosted by Christina Krause and Kevin Smith at the IHI National Forum on Quality Improvement in Health Care in December 2013.
The training covered using social media as a strategy for healthcare improvement and getting hands-on experience with Twitter. Attendees participated in networking activities and exercises to brainstorm how social media could help or hinder healthcare goals.
The session reviewed best practices for social media engagement including developing clear strategies, guidelines, and content like stories, presentations, photos and videos to share on platforms like Facebook, Twitter, and email newsletters. Metrics like open and click rates were discussed to measure effectiveness.
Plans for the University of Virginia School of Data ScienceMelissa Moody
This document proposes establishing a School of Data Science (SDS) at the University of Virginia. It was prepared by the Data Science Institute (DSI) to justify expanding the DSI into the SDS.
The document defines data science and proposes a mission for the SDS to train a diverse data science workforce, undertake leading-edge interdisciplinary research, and maximize societal benefit through data science. It outlines goals for the SDS's education, research, workforce development, and community engagement.
The document argues that a School is needed due to growing demand for data science education and employment opportunities. It describes how the SDS would build on the success of the existing DSI programs while strengthening collaborations across UVA schools and departments
DASA Innovation Partner, Tony Collins, discusses International Outreach.
DASA Senior Exploitation Manager, Eleanor Rice, discusses exploitation of innovation.
DASA Access to Mentoring and Finance Lead, Alan Scrase, discusses how his support will add value
ACE Advising Research Workshop Series 5: Creating a Research Proposalmgabra18
This document outlines topics to be covered in a workshop on writing research proposals, including: an introduction to research proposals and what they are used for; distinguishing between research design and methods; determining the scope of a research project; and planning and managing the project. The workshop aims to help attendees learn how to identify the structure of a research proposal, select appropriate research designs and methods, articulate the needs and timeline of a project, and maintain motivation. Attendees will also learn about obtaining IRB approval and preparing for uncertainties.
Presentation by Lisa Federer (UCLA) on 16 July 2013 as part of the IMLS-sponsored DMPTool Webinar Series.
Description: This webinar will discuss the special needs of health sciences researchers and help you learn how to talk to researchers in the health and medical fields about their data management needs. We will cover NIH Data Sharing Policy and how to write a data management plan that meets NIH’s requirements. After viewing this webinar, participants will understand: who is required to submit a plan; specific information that should be included in a plan; how to use the DMPTool to write an NIH-specific DMP; and where to find additional resources for help.
These are the slides from the Work Smarter Together event run on 23 October 2019.
If you download them you'll get to see the slide transitions and speaker notes which do not show in SlideShare (as least not that I know how to make it happen).
Michael
A project is a planned set of activities with defined start and end dates undertaken to achieve specific objectives within constraints of time, cost, quality and resources. Projects vary in size and complexity but all follow a typical life cycle of starting, organizing, carrying out work, and closing. Project management is the application of knowledge, skills, tools and techniques to project activities to meet project requirements. It involves planning, organizing, monitoring and controlling project resources to produce project deliverables within defined scope, quality, time and cost constraints. Project management aims to achieve the project objectives and obtain benefits that fulfill organizational strategies.
The University of Sydney has developed tools to help researchers with research data management planning. They created a research data management planning tool in 2014 called ReDBox to help researchers capture data management information and request access to storage and computing resources. Over time, they continued improving the tool based on researcher feedback. The future involves continuing to refine the tool based on reviews of the research data management landscape and users' experiences.
This document provides an introduction to research data management for geoscience PhD students. It defines research data and different data types. It discusses the importance of managing data throughout its lifecycle for efficient and valid research. It outlines funder requirements, university policies, and activities involved in good research data management like data planning, documentation, storage, sharing and preservation.
Don't Mention the G Word - How the University of Sheffield got GoogledAndy Tattersall
- The University of Sheffield is moving over 25,000 students and 6,500 staff to Google Apps for Education to meet growing demands for data storage, access, communication tools, and social platforms.
- Their existing student email and file storage systems are outdated and inflexible, while staff are using external tools like Dropbox.
- The transition will start with students and then move to staff. Training and support resources like workshops, online tutorials, and documentation are being provided.
- Successfully adopting new technologies requires addressing issues like changing habits and tip-of-the-iceberg costs beyond initial implementation. The benefits of Google Apps include low costs and flexibility to meet the University's evolving needs.
Pragmatic ethical and fair AI for data scientistsDavid Graus
1. David Graus presented on pragmatic and fair AI for recruitment and news recommendations.
2. He discussed how algorithms can unintentionally learn and reflect human biases around gender and race. However, AI may also help address these biases, such as through representational ranking in recruitment to achieve demographic parity.
3. Graus also explored using editorial values like diversity, dynamism and serendipity to guide news recommendations, and found their system could increase dynamism without loss of accuracy through constrained intervention.
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyDavid Graus
Lezing op de VOGIN-IP-lezing op 28 maart 2018 bij de Openbare Bibliotheek Amsterdam.
DISCLAIMER: dit praatje is een mooi stukje ouderwetse (menselijke) manipulatie: expert komt met een 5-tal aanbevelingen :-).
"Tegenwoordig kijkt men steeds vaker met argusogen naar technologiebedrijven die op grote schaal gebruikersgedrag verzamelen. In dit praatje zet ik uiteen waarom het inzetten van gebruikersgedrag van belang is, en hoe het wordt gebruikt om informatie effectief te kunnen ontsluiten en doorzoekbaar maken, of het nu gaat om een zoekmachine als Google, die zich een weg moet banen door een web van miljarden pagina’s, of een service als Spotify, die haar gebruikers graag de juiste muziek blijft aanbieden."
Layman's Talk: Entities of Interest --- Discovery in Digital TracesDavid Graus
The document outlines a program that includes a committee grilling a speaker at 10:00, the committee retreating afterwards, a ceremony at 10:15, and a reception downstairs from 11:00 to 12:30.
Slides of the talk I gave at PyData Amsterdam.
Abstract:
"The FD Mediagroep collects, analyses and filters valuable and relevant information, 24/7, for an influential group of professionals, business executives and high net worth individuals. Company.info (part of FDMG) provides complete, reliable, up-to-date company information and business news about no less than 2.7 million companies and other legal entities in the Netherlands. For Company.info we continuously monitor and crawl hundreds of (online) news sources, resulting in a large archive of (Dutch) business-related news, spanning hundreds of thousands of articles. These articles are automatically enriched, by linking the profiles of companies that are mentioned in the articles, using a custom in-house entity linking framework built in Python. In this talk, I will briefly explain the entity linking task, I will detail the implementation of our custom entity linking framework, and our pipeline for crawling and enriching news articles."
De Macht van Data --- Hoe algoritmen ons leven vormgevenDavid Graus
Slides of the introductory talk I gave at an event at De Balie: "De macht van data" on June 18th, 2017.
For a video recording of the talk see: http://graus.co/blog/mini-college-algoritmen/
Talk I gave at the Data Science Northeast Netherlands Meetup, where I detail the custom in-house entity linking framework, sentiment analysis, and entity salience scoring model we developed for Company.info, in addition to showing some example applications of our corpus of news articles linked to organization profiles.
Dynamic Collective Entity Representations for Entity RankingDavid Graus
This document proposes using collective intelligence to dynamically enrich entity representations from multiple sources like knowledge bases, anchors, tags, and tweets. It presents an adaptive ranking model that learns optimal weights for ranking features like field similarity and importance over time. An experiment on query logs shows expanding entities with different sources improves ranking and retraining the ranker with new content further enhances performance.
Dynamic Collective Entity Representations for Entity RankingDavid Graus
This document proposes using dynamic collective entity representations to improve entity ranking. It describes enriching static entity representations from knowledge bases with descriptions from dynamic sources like tweets, queries, and tags. An adaptive ranking model individually weights each description source and retrains over time using clicks. Experimental results show expanding representations and retraining the ranker improves ranking performance compared to a non-adaptive model, with different sources providing varying benefits depending on their dynamic nature and entity coverage.
David Graus presents his research on using semantic search techniques to improve information retrieval for digital forensic evidence from emails and other electronic documents. He discusses using social network analysis of communication patterns and language models of email content to predict likely recipients of emails. By combining these approaches, he is able to more accurately rank potential recipients than using either technique alone. Future work includes incorporating organizational structure and decay of communication patterns over time.
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus
David Graus from the University of Amsterdam gave a presentation on entity linking at the Search Engines Amsterdam conference on June 27th. He began by defining entity linking as linking mentions of entities in text to their corresponding entities in a knowledge base. He then gave an example of entity linking and discussed ranking entity candidates based on their prior probabilities like link probability and commonness. Finally, he described using both local and global features in supervised learning models to improve entity linking accuracy.
This document discusses understanding email traffic patterns through recipient recommendation. It explores using social network analysis and language models of email content to predict likely recipients of a given email. Specifically, it examines using measures of node importance in the network, strength of connections between nodes, and similarity between language models of communication profiles to rank and select recipient nodes. The findings indicate that combining social network analysis and language modeling performs better than either approach individually, and that language model similarity is most important for interpersonal communication, while network metrics are more informative for highly active users. Recipient recommendation could help with applications like anomaly detection in e-discovery.
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsDavid Graus
The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.
yourHistory - entity linking for a personalized timeline of historic eventsDavid Graus
The document describes an entity linking approach to generate a personalized timeline of historic events for a user. It involves 4 main parts: (1) fetching candidate historic events from DBpedia, (2) generating a user profile based on information extracted from the user's Facebook profile, (3) matching the candidate events to the user's interests in their profile, and (4) scoring and ranking the events to produce the final personalized timeline. The approach uses entity linking techniques to associate mentions of entities in the user's profile with the corresponding entries in a knowledge base, in order to identify the user's interests.
This document discusses research on applying text mining and information retrieval techniques for fact finding in regulatory investigations from electronic documents. The researchers are developing methods for semantic search in e-discovery to iteratively retrieve relevant evidence from emails, forums, and other sources by integrating structural context and extracting knowledge from unstructured text. Their current work includes using Twitter mining as a form of conversational search and entity linking to semantically enrich documents.
Semantic Annotation of the Cyttron DatabaseDavid Graus
Final Presentation for my MSc Graduation Project.
Abstract:
"Semantic annotation uses human knowledge formalized in ontologies to enrich texts, by providing structured and machine-understandable information of its content. This paper proposes an approach for automatically annotating texts of the Cyttron Scientific Image Database, using the NCI Thesaurus ontology. Several frequency-based keyword extraction algorithms were implemented and evaluated, aiming to extract important concepts and exclude less relevant ones. Furthermore, topic classification algorithms were applied to identify important concepts which do not occur in the text. The algorithms were evaluated by comparison to annotations provided by experts. Semantic networks were generated from these annotations and an ontology-based similarity metric was applied to perform the comparison. Finally the networks were visualized to provide further insights into the differences of the semantic structure generated by humans, and the algorithms."
More information: http://graus.nu/category/thesis
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
1. Bias in Recommendations
@ SIKS Course "Advances in Information Retrieval"
! David Graus
✉ david.graus@fdmediagroep.nl
🐦 @dvdgrs
2. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
whoami !
2
3. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
whoami !
• 🎓 Academia
• BA Media Studies @ UvA (2008)
• MSc Media Technology @ Universiteit Leiden (2012)
• PhD Information Retrieval @ UvA (2017)
2
4. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
whoami !
• 🎓 Academia
• BA Media Studies @ UvA (2008)
• MSc Media Technology @ Universiteit Leiden (2012)
• PhD Information Retrieval @ UvA (2017)
• 🏢 Industry
• Editor radio/online public broadcaster NTR (between BA & MSc)
• Research Intern @ Microsoft Research, US
• Data Scientist @ Company.info (FD Mediagroep)
• Lead Data Scientist @ FD SMART Journalism / BNR SMART Radio
2
5. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
In what is to follow…
3
6. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
In what is to follow…
• An introduction of FD Mediagroep
3
7. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
In what is to follow…
• An introduction of FD Mediagroep
• Personalization & RecSys at FD Mediagroep
3
8. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
In what is to follow…
• An introduction of FD Mediagroep
• Personalization & RecSys at FD Mediagroep
• Two flavors of bias in RecSys
• Model/Algorithmic bias
• Perceived bias in personalization
3
10. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 5
11. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
FD Mediagroup
12. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
The leading information provider in the financial economic domain
FD Mediagroup
13. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
The leading information provider in the financial economic domain
FD Mediagroup
in the Netherlands
14. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
The leading information provider in the financial economic domain
FD Mediagroup
in the Netherlands
15. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
The leading information provider in the financial economic domain
FD Mediagroup
in the Netherlands
16. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
17. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
18. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
AI @ FD Mediagroup
19. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
AI @ FD Mediagroup
10
20. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Team
11
Dung Bahadir Anca Philippe
Maya David Feng Li’ao
Klaus Oberon Manon Azamat
21. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Team
11
Dung Bahadir Anca Philippe
Maya David Feng Li’ao
Klaus Oberon Manon Azamat
22. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Team
11
Dung Bahadir Anca Philippe
Maya David Feng Li’ao
Klaus Oberon Manon Azamat
23. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Team
11
Dung Bahadir Anca Philippe
Maya David Feng Li’ao
Klaus Oberon Manon Azamat
24. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Team
11
Dung Bahadir Anca Philippe
Maya David Feng Li’ao
Klaus Oberon Manon Azamat
25. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
AI @ FDMG: Academia/Industry
26. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
AI @ FDMG: Academia/Industry
27. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
AI @ FDMG: Academia/Industry
28. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
AI @ FDMG: Academia/Industry
29.
30. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Radio
• (Transcribe)
• Segment
• Tag
• Serve
14
31. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Transcribe
15
32. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Segment
• Based on metadata,
text, and audio.
16
33. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Segment
• Based on metadata,
text, and audio.
16
34. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Tag
• Simple multilabel text
classifier
• Trained on transcripts of
segments + associated tags
from website
17
35. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Serve
• iOS/Android
app
18
36. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Serve
• iOS/Android
app
18
37. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Serve
• iOS/Android
app
18
39. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 20
40. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 20
41. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
21
42. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
21
43. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
• How to get there:
21
44. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
• How to get there:
• Content Understanding
21
45. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
• How to get there:
• Content Understanding
• Content-based Recommender System; <user, article>
21
46. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
• How to get there:
• Content Understanding
• Content-based Recommender System; <user, article>
• Personalized snippet retrieval; <user, snippet-in-article>
21
47. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
• How to get there:
• Content Understanding
• Content-based Recommender System; <user, article>
• Personalized snippet retrieval; <user, snippet-in-article>
• Snippet-to-summary abstractor (?)
21
48. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
SMART Journalism
• Moonshot; personalized summarization
• How to get there:
• Content Understanding
• Content-based Recommender System; <user, article>
• Personalized snippet retrieval; <user, snippet-in-article>
• Snippet-to-summary abstractor (?)
21
67. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
68. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
69. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
70. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
71. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
72. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
73. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
74. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
75. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
76. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
User
Profile
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
77. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
User
Profile
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
Tags: Boete, Chips, EU, Mededinging
78. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
User
Profile
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
79. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
User
Profile
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
80. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
User Profile
26
User
User
Profile
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
81. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
User
User
Profile
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
82. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
User
User
Profile
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
83. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
Tags: Big Data, Blog, Davos, Google, Technologie
User
User
Profile
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
84. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
Tags: Big Data, Blog, Davos, Google, Technologie
Rubriek: Davos
User
User
Profile
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
85. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
Tags: Big Data, Blog, Davos, Google, Technologie
Rubriek: Davos
Stylometrie: CharLen=2856, WordLen=524
User
User
Profile
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
86. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
Tags: Big Data, Blog, Davos, Google, Technologie
Rubriek: Davos
Stylometrie: CharLen=2856, WordLen=524
Entities: Google, Apple, Microsoft, Salesforce
User
User
Profile
Tags: Boete, Chips, EU, Mededinging
Rubriek: Ondernemen
Stylometrie: CharLen=3491, WordLen=635
Entities: Qualcomm, Apple, NXP, Intel, Google
87. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 27
Qualcomm krijgt bijna €1 mrd boete van Brussel
Tags: Boete, Chips, EU, Mededinging, Big
Data, Blog, Davos, Google, Technologie
Rubriek: Ondernemen, Davos
Stylometrie: CharLen=3491, WordLen=635, CharLen=2856,
WordLen=524
Entities: Qualcomm, Apple (2), NXP, Intel, Google (2), Microsoft,
Salesforce
Topman van softwaremaker Salesforce kraakt grote
techbedrijven
Tags: Big Data, Blog, Davos, Google, Technologie
Rubriek: Davos
Stylometrie: CharLen=2856, WordLen=524
Entities: Google, Apple, Microsoft, Salesforce
User
User
Profile
88. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Model
• Content-based RecSys
• Ranking w/ point-wise LTR
• Features: user, article, user-article features (~14k)
• Labels: implicit feedback
• Clicks (i.e., click = 1, non-click = 0)
• Trained nightly
28
89. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias?
• “Disproportionate weight in favor of or against an idea or thing,
usually in a way that is closed-minded, prejudicial, or unfair.”
29
90. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias in RecSys
“Algorithmic”
I. In Collaborative Filtering methods
II. In implicit feedback/clicks
30
91. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Collaborative
Filtering
31
92. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Collaborative
Filtering
31
93. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias in CF
32
94. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
Bias in CF
32
95. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
Bias in CF
32
96. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
• Bias: disproportionate weight in favor of popular items
Bias in CF
32
97. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
[2.] Meyer, F. Recommender systems in industrial contexts (2012)
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
• Bias: disproportionate weight in favor of popular items
• “It is generally not useful to recommend very popular items as they are generally
already known by the user” [2]
Bias in CF
32
98. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
[2.] Meyer, F. Recommender systems in industrial contexts (2012)
[3.] Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation, RMSE@RecSys ’19
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
• Bias: disproportionate weight in favor of popular items
• “It is generally not useful to recommend very popular items as they are generally
already known by the user” [2]
• “A market that suffers from popularity bias will lack opportunities to discover more
obscure products and will be, by definition, dominated by a few large brands […]” [3]
Bias in CF
32
99. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
[2.] Meyer, F. Recommender systems in industrial contexts (2012)
[3.] Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation, RMSE@RecSys ’19
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
• Bias: disproportionate weight in favor of popular items
• “It is generally not useful to recommend very popular items as they are generally
already known by the user” [2]
• “A market that suffers from popularity bias will lack opportunities to discover more
obscure products and will be, by definition, dominated by a few large brands […]” [3]
• Solution: cluster long-tail items
Bias in CF
32
100. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
[1.] Park & Tuzhilin. The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
[2.] Meyer, F. Recommender systems in industrial contexts (2012)
[3.] Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation, RMSE@RecSys ’19
• It is more difficult to predict ratings of infrequently rated items in Collaborative
Filtering
• Bias: disproportionate weight in favor of popular items
• “It is generally not useful to recommend very popular items as they are generally
already known by the user” [2]
• “A market that suffers from popularity bias will lack opportunities to discover more
obscure products and will be, by definition, dominated by a few large brands […]” [3]
• Solution: cluster long-tail items
Bias in CF
32
101. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias in implicit feedback
33
Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
102. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias in implicit feedback
• Popular items are overrepresented in implicit feedback
33
Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
103. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias in implicit feedback
• Popular items are overrepresented in implicit feedback
• Position/“trust" bias (see Joachims et al., 2005)
• Eye-tracking study + comparison w/ explicit feedback shows;
• Clicks reflect relevance judgments
• Clicks ranked highly receive more clicks
33
Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
104. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Bias in implicit feedback
• Popular items are overrepresented in implicit feedback
• Position/“trust" bias (see Joachims et al., 2005)
• Eye-tracking study + comparison w/ explicit feedback shows;
• Clicks reflect relevance judgments
• Clicks ranked highly receive more clicks
33
Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
105. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Perceived Bias from RecSys
34
106. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Perceived Bias from RecSys
• A state of intellectual isolation that
allegedly can result from personalized
searches when a website algorithm
selectively guesses what information a
user would like to see based on
information about the user.
• As a result, users become separated
from information that disagrees with
their viewpoints.
34
107. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Measuring personalization
35
108. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Measuring personalization
35
109. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Measuring personalization
• On average, 11.7% of results show differences due to
personalization on Google.
• Varies widely by search query and by result ranking.
• Only found measurable personalization as a result of searching
with a logged in account and the IP address of the searching user.
35
110. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
36[Hannák et al., 2013]
111. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
36[Hannák et al., 2013]
112. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
36[Hannák et al., 2013]
113. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
36[Hannák et al., 2013]
114. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
3. Compare results
36[Hannák et al., 2013]
115. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
3. Compare results
2. 🤖
36[Hannák et al., 2013]
116. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
3. Compare results
2. 🤖
1. Construct Google bot accounts
36[Hannák et al., 2013]
117. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
3. Compare results
2. 🤖
1. Construct Google bot accounts
• Vary aspects such as location, demographics, click behavior, browsing + search
history, etc.
36[Hannák et al., 2013]
118. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
3. Compare results
2. 🤖
1. Construct Google bot accounts
• Vary aspects such as location, demographics, click behavior, browsing + search
history, etc.
2. Have them issue the same set of queries
36[Hannák et al., 2013]
119. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. 👤
1. Get 200 volunteers with Google accounts
2. Have them issue the same set of queries
3. Compare results
2. 🤖
1. Construct Google bot accounts
• Vary aspects such as location, demographics, click behavior, browsing + search
history, etc.
2. Have them issue the same set of queries
3. Compare results
36[Hannák et al., 2013]
120. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
👤 Findings
37[Hannák et al., 2013]
121. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
👤 Findings
• On average, 11.7% of results show differences due to
personalization on Google.
37[Hannák et al., 2013]
122. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
👤 Findings
• On average, 11.7% of results show differences due to
personalization on Google.
• Top ranks tend to be less personalized than bottom ranks.
37[Hannák et al., 2013]
123. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
👤 Findings
38[Hannák et al., 2013]
124. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
👤 Findings
• ✅ Personalization based on location (e.g., company names)
38[Hannák et al., 2013]
125. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
👤 Findings
• ✅ Personalization based on location (e.g., company names)
• ❌ The least personalized results tend to be factual and health related
queries.
38[Hannák et al., 2013]
126. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
39[Hannák et al., 2013]
127. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
39[Hannák et al., 2013]
128. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
✅ Geolocation
39[Hannák et al., 2013]
129. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
✅ Geolocation
❌ Gender
39[Hannák et al., 2013]
130. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
✅ Geolocation
❌ Gender
❌ Age
39[Hannák et al., 2013]
131. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
✅ Geolocation
❌ Gender
❌ Age
❌ Search history
39[Hannák et al., 2013]
132. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
✅ Geolocation
❌ Gender
❌ Age
❌ Search history
❌ Click history
39[Hannák et al., 2013]
133. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
🤖 Findings
✅ Logged in vs. “cleared cookies” account
✅ Geolocation
❌ Gender
❌ Age
❌ Search history
❌ Click history
❌ Browsing history
39[Hannák et al., 2013]
134. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Diversity to pop the filter bubble
40
135. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Diversity to pop the filter bubble
40
136. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
41[Nguyen et al., 2014]
137. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Split MovieLens users into two groups:
41[Nguyen et al., 2014]
138. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Split MovieLens users into two groups:
• “Followers”: users who rated movies they were recommended
41[Nguyen et al., 2014]
139. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Split MovieLens users into two groups:
• “Followers”: users who rated movies they were recommended
• “Ignorers”: users who rated movies they were not
recommended
41[Nguyen et al., 2014]
140. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Split MovieLens users into two groups:
• “Followers”: users who rated movies they were recommended
• “Ignorers”: users who rated movies they were not
recommended
• Compare between groups, over time:
41[Nguyen et al., 2014]
141. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Split MovieLens users into two groups:
• “Followers”: users who rated movies they were recommended
• “Ignorers”: users who rated movies they were not
recommended
• Compare between groups, over time:
• Diversity of recommendations
41[Nguyen et al., 2014]
142. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Split MovieLens users into two groups:
• “Followers”: users who rated movies they were recommended
• “Ignorers”: users who rated movies they were not
recommended
• Compare between groups, over time:
• Diversity of recommendations
• Ratings of movies
41[Nguyen et al., 2014]
143. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
42[Nguyen et al., 2014]
144. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
1. Diversity
42[Nguyen et al., 2014]
145. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
1. Diversity
• In both groups, diversity decreases over time.
42[Nguyen et al., 2014]
146. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
1. Diversity
• In both groups, diversity decreases over time.
• The effect is lessened for users who consume recommended
items (followers)
42[Nguyen et al., 2014]
147. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
1. Diversity
• In both groups, diversity decreases over time.
• The effect is lessened for users who consume recommended
items (followers)
2. Ratings
42[Nguyen et al., 2014]
148. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
1. Diversity
• In both groups, diversity decreases over time.
• The effect is lessened for users who consume recommended
items (followers)
2. Ratings
• Slight decrease in average ratings for ignorers (3.74 to 3.55).
42[Nguyen et al., 2014]
149. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
1. Diversity
• In both groups, diversity decreases over time.
• The effect is lessened for users who consume recommended
items (followers)
2. Ratings
• Slight decrease in average ratings for ignorers (3.74 to 3.55).
• Stable average ratings for followers (~3.68).
42[Nguyen et al., 2014]
150. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Diversity in RecSys 🤖 vs. humans 👤?
43
151. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Diversity in RecSys 🤖 vs. humans 👤?
43
152. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
44[Möller et al. 2018]
153. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• 🤖 Generate article recommendations for news articles using
different RecSys algorithms (CF & CB).
44[Möller et al. 2018]
154. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• 🤖 Generate article recommendations for news articles using
different RecSys algorithms (CF & CB).
• 👤 Compare to hand-picked article recommendations.
44[Möller et al. 2018]
155. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• 🤖 Generate article recommendations for news articles using
different RecSys algorithms (CF & CB).
• 👤 Compare to hand-picked article recommendations.
• Measure & compare “diversity” of recommended articles:
• At content level
• At tag level
• At category level
• At sentiment/subjectivity level
44[Möller et al. 2018]
156. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
45[Möller et al. 2018]
157. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
45[Möller et al. 2018]
158. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
45[Möller et al. 2018]
159. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
45[Möller et al. 2018]
160. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
45[Möller et al. 2018]
161. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
45[Möller et al. 2018]
162. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings
“Conventional recommendation algorithms at least preserve the
topic/sentiment diversity of the article supply.”
45[Möller et al. 2018]
163. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
More diversity
46
164. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
More diversity
46
165. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Aim
Increase exposure to varied political opinions
with a goal of improving civil discourse
47[Yom-Tov et al. 2014]
166. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Classify searchers into political leaning (using geo data)
48[Yom-Tov et al. 2014]
167. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
49[Yom-Tov et al. 2014]
168. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Infer political leaning of news sources from user behavior.
49[Yom-Tov et al. 2014]
169. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Infer political leaning of news sources from user behavior.
49[Yom-Tov et al. 2014]
170. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Infer political leaning of news sources from user behavior.
49[Yom-Tov et al. 2014]
171. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Infer political leaning of news sources from user behavior.
• Identify polarized search queries (with strong political leanings —
in both directions).
49[Yom-Tov et al. 2014]
172. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
50[Yom-Tov et al. 2014]
173. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Treatment group: Insert red results for blue users, and blue
results for red users
50[Yom-Tov et al. 2014]
174. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
• Treatment group: Insert red results for blue users, and blue
results for red users
• Control group: Do not adjust results
50[Yom-Tov et al. 2014]
175. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
51
176. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. Short term: Compare clicks/behavior between control &
treatment.
51
177. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. Short term: Compare clicks/behavior between control &
treatment.
2. Long term: Measure during two weeks, per user;
51
178. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. Short term: Compare clicks/behavior between control &
treatment.
2. Long term: Measure during two weeks, per user;
1. Polarization: Difference of user’s leaning-score compared to
average leaning across all sources.
51
179. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Method
1. Short term: Compare clicks/behavior between control &
treatment.
2. Long term: Measure during two weeks, per user;
1. Polarization: Difference of user’s leaning-score compared to
average leaning across all sources.
2. Engagement: Average number of queries + average read
articles.
51
180. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings I
52[Yom-Tov et al. 2014]
181. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings I
• Less clicks on inserted opposing sources.
52[Yom-Tov et al. 2014]
182. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings I
• Less clicks on inserted opposing sources.
• But:
“Results pages of the opposing viewpoint which had a similarity
higher than the average tended to be clicked 38% more than those
below the average.”
52[Yom-Tov et al. 2014]
183. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
53[Yom-Tov et al. 2014]
184. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
• Polarization:
53[Yom-Tov et al. 2014]
185. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
• Polarization:
• Treatment: Average leaning ‘moves’ ~25% to centre
53[Yom-Tov et al. 2014]
186. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
• Polarization:
• Treatment: Average leaning ‘moves’ ~25% to centre
• Control: Negligible difference (~1%)
53[Yom-Tov et al. 2014]
187. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
• Polarization:
• Treatment: Average leaning ‘moves’ ~25% to centre
• Control: Negligible difference (~1%)
• Engagement:
53[Yom-Tov et al. 2014]
188. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
• Polarization:
• Treatment: Average leaning ‘moves’ ~25% to centre
• Control: Negligible difference (~1%)
• Engagement:
• Treatment: Number of queries: +9% / articles read: +4%
53[Yom-Tov et al. 2014]
189. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Findings II
• Polarization:
• Treatment: Average leaning ‘moves’ ~25% to centre
• Control: Negligible difference (~1%)
• Engagement:
• Treatment: Number of queries: +9% / articles read: +4%
• Control: Small reduction in both (~2.5%)
53[Yom-Tov et al. 2014]
190. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019
Refs
Algorithmic bias
1. Park & Tuzhilin, The Long Tail of Recommender Systems and How to Leverage It (RecSys ’08)
2. Meyer, Recommender systems in industrial contexts (2012)
3. Abdollahpouri et al., The Unfairness of Popularity Bias in Recommendation (RMSE@RecSys ’19)
4. Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback (SIGIR ’05)
Perceived bias / filter bubbles
5. Hannak et al., Measuring personalization of web search (WWW ’13)
6. Nguyen et al., Exploring the filter bubble: the effect of using recommender systems on content diversity (WWW ’14)
7. Möller et al., Do not blame it on the algorithm — An empirical assessment of multiple recommender systems and their impact
on content diversity (Information Communication and Society ’18)
8. Yom-Tov et al., Promoting Civil Discourse Through Search Engine Diversity (Social Science Computer Review, ’13)
54
191. David Graus • SIKS Course: Advances in Information Retrieval • 08/10/2019 55