Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Exploratory Analysis of User Data
1. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Exploratory Analysis of User Data
Behrooz Omidvar-Tehrani
Research Scientist at Grenoble AI Institute
http://www.omidvar.info
Intensive course in RAIS summer school, 17-19 May 2021
2. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Behrooz Omidvar-Tehrani, PhD in Computer Science and Applied Mathematics
• Research focus on interactive data analysis, at the crossroad of machine leaning, data science, and data mining.
About the instructor
2
Postdoctoral Researcher at The Ohio State University
2016-2017
Postdoctoral Researcher at The Grenoble Alpes University
2017-2018
Research Scientist at Naver Labs Europe
2019-2020
Research Scientist at Grenoble AI Institute
2021-Present
3. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Why user data?
• Because user data is ubiquitous.
• Users are very active on the Web generating user data.
• Here is what has happened in last 5 minutes on the Web (per http://pennystocks.la/internet-in-real-time):
3
3M new tweets
posted in Twitter
24M videos
watched in Youtube
274K photos uploaded
in Instagram
8M photos liked in
Instagram
22M searches
performed in Google
16M posts added in
Facebook
12M messages sent in
WhatsApp
51K video hours
watched in Net
fl
ix
1M users participated
in a Zoom call
4. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Hunger for user data
• The number of requests to obtain user data has
been increased drastically.
• Google received 48,941 government data requests
affecting 83,345 user accounts in the
fi
rst six months
of 2017. The United States issued 16,823 of these
requests.
• Dataset Search indexes almost 25 million
user datasets. (https://blog.google/products/
search/discovering-millions-datasets-web/)
4
6. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• User data is voluminous and noisy, hence hard to get insights from.
• Often an analysis pipeline is designed to tackle the challenges of volume and noise.
• We often call it in its abbreviated form as UDA pipeline.
• Why post-processing?
Because mined results and recommendations need to be rendered in a human-understandable form.
• Why user data presentation?
When digesting the insights, the human brain performs better on visual elements than on textual information.
• Why user data exploration?
An exhaustive scan through all discovered groups is not possible for users.
User data analysis pipeline
6
Raw user data
User Data
Preparation
towards less
noise
towards less
volume User Data Mining,
Learning, and
Recommendation
post-
processing
User Data
Presentation
User Data
Exploration
interaction
User
[Omidvar-Tehrani, Amer-Yahia, Simon @ HILDA’19]
7. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
User roles in UDA pipelines
• Users with different roles and needs write UDA pipelines to achieve tasks.
7
Data scientist Domain expert Information consumer
who brings
analysis expertise
who brings
domain knowledge
who brings
task
8. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Objectives and the timeline of the course
8
Objectives
• Motivate UDA and UDA pipelines and illustrate its importance in practice
• Understand the underlying structure of user data in its general form
• Walk through the UDA pipelines and discuss its components, from preparation to exploration
• Work on hands-on experiences to observe the challenges of UDA implementation in practice
• Get familiar with the state of the art in UDA research
Timeline
• Session 1. Monday 17 May 2021 at 10:30 - 12:30 (Introduction, User Data Preparation and Visualization)
• Session 2. Tuesday 18 May 2021 at 10:30 - 12:30 (User Data Mining and Recommendation)
• Session 3. Wednesday 19 May 2021 at 10:30 - 12:30 (User Data Exploration with Reinforcement Learning)
9. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Topics covered in the course
9
Raw user data
User Data
Preparation
towards less
noise
towards less
volume User Data Mining,
Learning, and
Recommendation
post-
processing
User Data
Presentation
User Data
Exploration
interaction
User
SESSION 1
SESSION 2
SESSION 1
SESSION 3
10. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Topics covered in the course
9
Raw user data
User Data
Preparation
towards less
noise
towards less
volume User Data Mining,
Learning, and
Recommendation
post-
processing
User Data
Presentation
User Data
Exploration
interaction
User
SESSION 1
SESSION 2
SESSION 1
SESSION 3
What is the general model behind all user datasets?
How to prepare user data for analysis?
How to increase the quality of user data?
11. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Topics covered in the course
9
Raw user data
User Data
Preparation
towards less
noise
towards less
volume User Data Mining,
Learning, and
Recommendation
post-
processing
User Data
Presentation
User Data
Exploration
interaction
User
SESSION 1
SESSION 2
SESSION 1
SESSION 3
What is the general model behind all user datasets?
How to prepare user data for analysis?
How to increase the quality of user data?
How to make sense out of user data?
How to discuss user data with collaborators?
12. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Topics covered in the course
9
Raw user data
User Data
Preparation
towards less
noise
towards less
volume User Data Mining,
Learning, and
Recommendation
post-
processing
User Data
Presentation
User Data
Exploration
interaction
User
SESSION 1
SESSION 2
SESSION 1
SESSION 3
What is the general model behind all user datasets?
How to prepare user data for analysis?
How to increase the quality of user data?
How to discover (mine) insights in user data?
How to build a recommender engine for user data?
How to recommend to a group of users?
How to make sense out of user data?
How to discuss user data with collaborators?
13. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Topics covered in the course
9
Raw user data
User Data
Preparation
towards less
noise
towards less
volume User Data Mining,
Learning, and
Recommendation
post-
processing
User Data
Presentation
User Data
Exploration
interaction
User
SESSION 1
SESSION 2
SESSION 1
SESSION 3
What is the general model behind all user datasets?
How to prepare user data for analysis?
How to increase the quality of user data?
How to discover (mine) insights in user data?
How to build a recommender engine for user data?
How to recommend to a group of users?
How to make sense out of user data?
How to discuss user data with collaborators?
How to build interactive user data analysis systems?
How to learn interactions with user data?
How to guide users in labor-intensive tasks?
14. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
This course is interactive.
You participate in 10 polls throughout
the course.
Course material
10
Hands-on experiences
Some code templates will be delivered at
the end of each session to practice the
learned material.
Course slides
Available at http://www.omidvar.info/#activities
(“teaching”section)
Questions
Please use during the sessions.
For all other questions, email me at
behrooz@omidvar.info.
15. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
About exercises
11
Hands-on #1: Research paper
fi
nder
Practicing data crawling and data collection
Requirement: Python
Hands-on #2: D3 histogram
Practicing user data visualization
Requirement: Java Script and HTML
Hands-on #3: Mining user groups
Practicing user data mining and itemset mining
Requirement: Python, basic C, basic cmd
Hands-on #4: Multi-objective mining
Practicing multi-objective optimization
Requirement: Java
Hands-on #5: Recommendation
Practicing recommendation algorithms
Requirement: Python
Hands-on #6: Implementing exploration semantics
Practicing data / problem modeling
Requirement: Math and Logic
Hands-on #7: Designing a Markov Decision Process
Practicing Markov Decision Processes
Requirement: Math and Logic
Hands-on #8: RL for Exploratory User Data Analysis
Practicing reinforcement learning
Requirement: Python
16. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Question. You are a data scientist in a company owning terabytes of user data. They ask you to deliver some
good insights about their data but they don’t have any speci
fi
c questions to ask (or any hypotheses to form).
They only give you one week to deliver results. How do you prioritize your actions?
Poll: Prioritizing actions in user data analysis
12
A5
5 %
A4
5 %
A3
25 %
A2
30 %
A1
35 %
• Popular answers
• (A1) I start cleaning the data, building a visualization dashboard, and present
some insights using the dashboard.
• (A2) I prepare the data for exploration and ask the data owners to navigate in
the data and evaluate some hypotheses.
• (A3) I don't start the implementation, and I'll
fi
rst think on the paper for a bit,
in order to come up with a good pipeline plan.
• (A4) I start performing some predictions on the raw data, following some
post-processing steps.
• (A5) I will perform some mining on the raw data, following some post-
processing steps. Votes
17. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event 13
Raw user data
User Data
Preparation
towards less
noise
towards
less volume User Data Mining,
Learning, and
Recommendation
post-
processing
User Data
Presentation
User Data
Exploration
interaction
User
User Data Preparation and Visualization
SESSION 1
18. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• User data is a (complex) bipartite graph between the set of users and the set of items .
• Attributes describe both users and items.
𝒰
ℐ
𝒜
User data model
14
User demographics
gender
age
occupation
location
health status
Users
𝒰
Items ℐ
movie medicine grocery
music book tweet
action
Temporal actions
[Omidvar-Tehrani, Amer-Yahia @ TKDE’20]
19. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Users are not independent entities and they are connected through social links.
• Social links can be explicit (friendship in Facebook, following Twitter, co-authorship), or implicit (like-minded
users).
Links between users
15
Mary and John are explicitly
linked through their
friendship in Facebook.
Mary is a female
engineer.
John is a male
student.
Elena and Amber are
implicitly linked through
their interest in drama-genre
movies.
Elena is a female
professor.
Amber is a female
pianist.
Elena likes The Godfather
(Crime, Drama).
Amber likes Titanic
(Romance, Drama).
20. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• The simple bipartite structure of user data contains many pieces of useful information.
Simple data structure but rich value
16
Amber is a female
pianist.
Amber likes Titanic.
Item attributes. Titanic is produced in 1997 by James Cameron,
starring Leonardo DiCaprio and Kate Winslet.
Action attributes. Amber like the movie Titanic on 17
May 2021, at 3365 Indiana Street, San Diego, USA.
User groups. Amber belongs to the group of female
pianists in California with 34K members.
Abstract user groups. Amber also belongs to the group of
females, the groups of pianists, the group of Californians, and
the group of Titanic lovers.
Abstract user attributes.
Amber is also an artist.
21. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• User data preparation is the process of preparing (raw) user data for UDA.
• The outcome of user data preparation is another version of user data with less noise.
User data preparation
17
Raw user data
User Data
Preparation
towards less
noise
towards
less volume User Data Mining,
Learning, and
Recommendation
post-
processing
User Data
Presentation
User Data
Exploration
interaction
User
Extract, Transform,
Load (ETL)
User Data
Ingestion
User Data
Integration
User Data
Cleaning
User Data
Post-processing
(Augmentation, Delivery)
22. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• The
fi
rst step in user data preparation is called ETL.
• Extraction of user data from a source is the
fi
rst phase of ETL. The literature often considers the “ingestion” and
“integration” steps also inside this
fi
rst part of ETL.
• Transform is a mediator phase to apply a set of rules and pre-de
fi
ned functions to prepare the data to load. The
literature often considers “data cleaning” also as a component of this ETL part.
• Load is the last phase to place the data in the hosting structure, such as a relational or NoSQL database.
Where to obtain (public) user data?
• Collect user data using Amazon Mechanical Turk, Survey Monkey, and other similar platforms.
• Crawl user data using BeautifulSoup and other similar libraries. The process is also called web scraping.
• Download the data from dataset repositories, e.g., UCI, Kaggle, Github, Google Dataset Search, Harvard
Dataverse, etc.
Extract, Transform, Load (ETL)
18
23. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We crawl data if no direct and easy access is available to the data under question.
• Before crawling, we always have to check copyright issues. Also note that some websites offer their own APIs.
• Webpages with some regularities are the best candidates for crawling.
• Beautiful Soup is a Python library for pulling data out of HTML (https://www.crummy.com/software/
BeautifulSoup/bs4/doc/).
Data acquisition using crawling
19
from bs4 import BeautifulSoup
import urllib2
url_template = "https://dblp.org/db/conf/sigmod/sigmod2020.html"
keywords = ["user data"]
page = urllib2.urlopen(url_template)
soup = BeautifulSoup(page, "html.parser")
papers = soup.findAll("span", {"class": "title"})
for paper in papers:
paper_str = paper.text
for keyword in keywords:
if paper_str.find(keyword) != -1:
print(paper_str)
break
24. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Task. Write a Python code that automatically
fi
nd all research papers (and their authors) about a given set of
keywords , where is an input parameter.
• Download the Python code paper-
fi
nder.py in the following link, and complete it: https://drive.google.com/
drive/folders/1M-HlNao9tYwqN0imeZ-SzHnGZKMoJgh4?usp=sharing.
• Missing parts are marked with a TODO comment.
𝒲𝒲
Hands-on 1: Research paper
fi
nder
20
DM Authors dataset is build in the same way.
Available in PerSCiDO platform via https://doi.org/
10.18709/perscido.2016.10.ds32
[Omidvar-Tehrani, Amer-Yahia, Termier @ CIKM’15]
25. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Nowadays most web pages are highly dynamic, and such dynamic content is more arduous to coalesce.
• ScrapingBee is a library for headless web browsing. It emulates human behavior so that websites don’t block
the crawling process.
• Selenium is an open-source project for browser automation. The following code crawls a webpage protected
with login.
Advanced data collection
21
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver.get("https://news.ycombinator.com/login")
print(driver.page_source)
login = driver.find_element_by_xpath("//input").send_keys(USERNAME)
password = driver.find_element_by_xpath("//input[@type='password']").send_keys(PASSWORD)
submit = driver.find_element_by_xpath("//input[@value='login']").click()
driver.quit()
26. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Data cleaning refers to a process of detecting and
removing noise in data.
• The cleanliness of data can be evaluated using
different measures such as validity, accuracy,
completeness, consistency, and uniformity.
• User data cleaning techniques:
• Dealing with missing values
• Dealing with outliers
• Data improvement
• Data tidy-up
• Scaling
User data cleaning
22
27. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Missing values are considered as noise.
• In user datasets, many attribute values are missing (e.g., gender, occupation, visitation date, etc.)
• When the data is missing, we either follow dropping or imputation technique.
• Dropping is often performed using a threshold.
• Imputation preserves the data size, hence more preferable to dropping.
• Numerical imputation. Consider a default value for the missing data for instance 0 to replace None. Median is
another value to consider (why not average?)
• Categorical Imputation. Replace the missing values with the maximum occurred value in a column, otherwise use
“other”.
User data cleaning techniques: missing values
23
threshold = 0.7
#Dropping columns with missing value rate higher than threshold
data = data[data.columns[data.isnull().mean() < threshold]]
#Dropping rows with missing value rate higher than threshold
data = data.loc[data.isnull().mean(axis=1) < threshold]
28. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Outliers are considered as potential noise.
• An outlier is a piece of data that doesn’t look normal.
• Methods for outlier detection are visualization (the most
effective method), standard deviation, and percentiles.
• If a value has a distance to the average higher than X times
standard deviation, it can be assumed as an outlier.
• A certain percent of the value from the top or the bottom
can be considered as an outlier.
• Outlier values can be either dropped or capped.
• Akin to missing data techniques, the former doesn’t maintain the
data size, while the latter does.
User data cleaning techniques: outliers
24
Is Brazil an outlier? What about Burundi?
29. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Data cleaning is not always about reducing noise, but also increasing the utility of user data.
• Examples of data improvement techniques are binning and log transform.
User data cleaning techniques: data improvement
25
Percentage binning Log transform
30. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• A user dataset is called tidy iff every row represents a user and every column represents a feature.
• Tidy datasets are easy to manipulate, model and visualize.
• Grouping is the process of making an un-tidy data, tidy. Common grouping operations are average, sum, and
concatenation.
• Is ungrouping (tidy to untidy) necessary too?
User data cleaning techniques: data tidy-up
26
user score user score
u1 65 u3 60
u2 14 u2 30
u1 32 u1 90
user average score
u1 62.33
u2 22
u3 60
Transaction user dataset (un-tidy) Tidy user dataset
Grouping
31. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Data cleaning frameworks
27
by Michael Stonebraker (ACM Turing Award winner) focusing on data mastering and uni
fi
cation.
Apple inductiv by Christopher Ré, Ihab Ilyas, and Theodoros Rekatsinas focusing on employing arti
fi
cial
intelligence to automate the task of identifying and correcting errors in data.
by same leaders of inductiv focusing on providing a a Machine Learning system for data repair and
predictions on structured data.
OpenCloud by NYU Data Science focusing on providing a Python library for data preprocessing and cleaning.
by Laure Berti-Equille focusing on providing a Python library for data preprocessing and
cleaning based on Q-Learning.
32. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Question. You are the head of a data engineering team in a healthcare company. Their user data is entered
manually by nurses and hence is noisy, which means it includes many missing and possibly inaccurate values
in patient information. How do you prioritize between the data cleaning techniques?
Poll: Prioritizing data cleaning techniques
28
Votes
0
1
3
4
5
Data cleaning techniques
Feature split Dropping Grouping Scaling Imputation Binning Log transform
33. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
User data visualization
• Sensemaking of user data using visual variables.
• A visualization component consists of three building blocks: views, visual variables and visual elements.
• Visualization can be done either at the beginning or at the end
of UDA pipelines, for hypothesis testing and validation,
respectively.
• At the core of visualizing user data is a mapping function that
associates user characteristics with visual variables.
• The following is the visualization of
MovieLens dataset.
29
(a)
View
Visual variables
Visual elements
[Zegarra et al., FGCS’20]
[Heer and Hellerstein, VLDB’09]
34. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• User data can be visualized with typical visualization tools such as Tableau, or with more specialized approaches
such as graph-based or location/time-based visualization.
Types of visualization
30
Off-the-shelf
visualization
Graph-based
visualization
Geospatial and temporal
visualization
Application-dependent
visualization
NodeTrix
[Henry et al., TVCG’07]
Freund et al.: Bike-Sharing Analytics
10 Article submitted to Interfaces; manuscript no. (Please, provide the mansucript number!)
Figure 2 The Screenshot Shows Older Versions of the Developed Map in NYC and Washington D.C.
Note. The circles on the map indicate to dispatchers which stations should have bikes added (in blue) and which
ones should have bikes removed (in red), with the area of each circle proportional to the recommended number. Map
data: c 2018 Google.
significant implications for Motivate’s operations. In particular, the unique minimum at
each station provides a natural target for rebalancing at a given point in time. Motivate
uses these target levels in a decision aid we developed to guide dispatchers over the course
Bike angels
[Chung et al., COMPASS’18]
19
Figura 2.7: Feature Driven System overview
Interesting phases of a single player can be automatically found by applying the clustering appro-
ach. In this figure, they analyze a forward and are interested in the attacks that the player was
involved. Resulting phases can be inspected using the small-multiples view (top-right panel) in
combination with the other rendering layers and Horizon Graphs (left and bottom panels).
projections, and compare it to traditional heatmaps.
Soccer analytics
[Machado et al., CG’17]
Players are users and their
actions are visualized to obtain
insights.
Dispatchers are informed for
adding (in blue) or removing (in
red) of bikes for the stations.
User groups are shown using
node-link diagrams and
adjacency matrices.
Visualization grammars
[Satyanarayan et al., TVCG’17]
Visual grammars facilitate
creating, saving, and
sharing visual analytics.
35. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• D3.js is a JavaScript library web-based visualization. (Why web-based?)
• D3 stands for Data-Driven Documents.
• The starting point is often from the visualization zoo at
https://d3js.org.
Web-based visualization
31
Developed by Jeffrey Heer in
University of Washington
<div id="scatter_area"></div>
<script src="https://d3js.org/d3.v4.js"></script>
<script>
var margin = …
var svg = d3.select("#scatter_area") …
var data = [ {x:10, y:20}, {x:40, y:90}, {x:80, y:50} ]
var x = d3.scaleLinear() …
var y = d3.scaleLinear() …
svg.selectAll("whatever").data(data).enter() …
</script>
[Bostock et al., TVCG’11]
36. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• D3.js is a JavaScript library web-based visualization. (Why web-based?)
• D3 stands for Data-Driven Documents.
• The starting point is often from the visualization zoo at
https://d3js.org.
Web-based visualization
31
Developed by Jeffrey Heer in
University of Washington
<div id="scatter_area"></div>
<script src="https://d3js.org/d3.v4.js"></script>
<script>
var margin = …
var svg = d3.select("#scatter_area") …
var data = [ {x:10, y:20}, {x:40, y:90}, {x:80, y:50} ]
var x = d3.scaleLinear() …
var y = d3.scaleLinear() …
svg.selectAll("whatever").data(data).enter() …
</script>
var x = d3.scaleLinear()
.domain([0, 100])
.range([0, width]);
svg.append('g')
.attr("transform", "translate(0," + height + ")")
.call(d3.axisBottom(x));
[Bostock et al., TVCG’11]
37. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Hands-on 2: D3 histogram
32
The following
fi
gure shows that the peak hours
were around 11AM and 5PM. It also shows that
no log-in was done early morning.
$ python -m SimpleHTTPServer 8000 // Python 2
$ python3 -m http.server 8000 // Python 3
• Task. We are given a CSV
fi
le including hours that users logged in to a platform under investigation. Visualize a
histogram for this data using D3.
• Download the content in the sub-folder D3-Histogram in the following link, and complete it: https://
drive.google.com/drive/folders/1f82RplHgLte223QoD99UIKEM3IJSV4y5?usp=sharing.
• Missing parts are marked with a TODO comment.
• Important. You need a virtual server to run
this example. You can simply use:
38. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Cross
fi
lter is JavaScript library focusing on fast multidimensional
fi
ltering for coordinated views.
• In other words, Cross
fi
lter brings interactivity to visualizations.
• Source
fi
les are accessible via https://github.com/cross
fi
lter/cross
fi
lter. See examples in https://
drarmstr.github.io/chartcollection/examples/#worldbank.
Cross
fi
lter
33
[Omidvar-Tehrani et al., ICDE’17]
39. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Cross
fi
lter is JavaScript library focusing on fast multidimensional
fi
ltering for coordinated views.
• In other words, Cross
fi
lter brings interactivity to visualizations.
• Source
fi
les are accessible via https://github.com/cross
fi
lter/cross
fi
lter. See examples in https://
drarmstr.github.io/chartcollection/examples/#worldbank.
Cross
fi
lter
33
[Omidvar-Tehrani et al., ICDE’17]
40. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Cross
fi
lter is JavaScript library focusing on fast multidimensional
fi
ltering for coordinated views.
• In other words, Cross
fi
lter brings interactivity to visualizations.
• Source
fi
les are accessible via https://github.com/cross
fi
lter/cross
fi
lter. See examples in https://
drarmstr.github.io/chartcollection/examples/#worldbank.
Cross
fi
lter
33
[Omidvar-Tehrani et al., ICDE’17]
41. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Various approaches have been proposed for the visualization of time-based activities of users, in an interactive
manner.
• EventFlow is an example of leveraging time dimension where groups of users are shown along their temporal
actions in a visual interface. (https://hcil.umd.edu/event
fl
ow/)
Time-based visualization
34
[Monroe et al., TVCG’13]
Group of patients with
common treatments
Length of treatments
42. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Behavioral analysis is to extract value from user data.
• User data is modeled as a bipartite graph with users on one hand and actions on the other.
• User data analysis pipeline contains user data preparation, mining and recommendation, presentation and
exploration.
• We often obtain user data by collecting, crawling (scraping), or downloading from dataset repositories.
• Main tasks in user data cleaning deals with missing values, outliers, data improvement,
data tidy-up, and data scaling.
• At the core of visualizing user data is a mapping function that associates user
characteristics with visual variables.
• Visualization of user evolution needs special care.
Takeaways from the
fi
rst session
35
43. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event 36
Raw user data
User Data
Preparation
towards less
noise
towards
less volume User Data Mining,
Learning, and
Recommendation
post-
processing
User Data
Presentation
User Data
Exploration
interaction
User
User Data Mining and Recommendation
SESSION 2
45. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We employ user data for two separate tasks: mining and recommendation.
• Mining
• To understand and represent user behaviors in the captured data.
• A famous application in industry is cross-selling: “customer who bought this
item also bought …”.
• The fundamental assumption is that there exist groups of user activities formed
by like-minded users which constitute different instances of user behavior.
Hence the main action is grouping.
• Recommendation
• To predict future user behaviors in the captured data. Recommendation is great approach for personalization.
• The fundamental assumption is that there exist a latent relation in user interactions, which can also predict future
possible interactions. Hence the main action is relation discovery.
User data mining and recommendation
38
http://cliintel.com/diapers-beer-and-data-in-retail/
46. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
User data mining
39
• The main action in user data mining is grouping, which is often resided in an unsupervised context.
• We need two elements to group users: a distance function, and representation approach.
• The distance function imposes the grouping / mining semantics. It enforces how two users should / should not
be placed in a common group. Sometimes it is called similarity function.
• The representation approach de
fi
nes how each mined group should be labeled. In the following example,
majority voting is used for representation.
Mia likes 60 drama movies and
40 action movies.
Group of drama-
genre lovers
Group of action-
genre lovers
distance?
distance?
47. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Myriads of grouping methods
40
Community and Clique Detection
[Newman, Physical J.’04]
[Barbieri et al., ICDM’13]
[Goyal et al., CIKM’08]
Team and Tribe Formation
[Nikolaev et al., KDD’16]
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Aged 18−29
Aged 30−44
Aged 45+
Aged under 18
Females
Females Aged 18−29
Females Aged 30−44
Females Aged 45+
Females under 18
IMDb staff
Males
Males Aged 18−29
Males Aged 30−44
Males Aged 45+
Males under 18
Non−US users
Top 1000 voters
US users
0.0
2.5
5.0
7.5
Average
The Social Network, 7.7/10
1 2 3 4 5
The Blair Witch Project (1999)
0.0
0.4
0.8
Population: All, Average: 3
1 2 3 4 5
American Beauty (1999)
0.0
0.4
0.8
Population: All, Average: 4.3
1 2 3 4 5
American Beauty (1999)
0.0
0.4
0.8
Population: Middle-Age, Boston,
Average: 3.17
(a) (b) (c)
gure 1: (a) Segments on IMDb (b) Segments’ Distributions (c) Segments Exploration with Rating Maps
ween the rating distribution of a segment and an input
tribution of interest. Second, a scalable algorithm for
ploring the huge search space and dynamically building
ing maps is imperative. Finally, the segments forming a
p must satisfy certain quality criteria: coverage of input
ing records, diversity in segment description to show dif-
ent facets of the rater population, size of each segment
., not too small), and high proximity of each segment to
input distribution.
n a nutshell this paper makes the following contributions:
1. We show that several sophisticated distance measures
to discriminate between distributions. We show that the
rth Mover’s Distance (EMD) [20] is able to capture subtle
erences between two distributions and is appropriate for
building rating maps. Section 3 performs a study of various
distance measures. In Section 4.2, we discuss DTAlg, along
with the RF heuristics. Our experimental study and findings
are given in Section 5. Related work is discussed in Section 6.
Section 7 summarizes and concludes the paper.
2. DATA MODEL
A rated dataset consists of a set of users with schema
SU , items with schema SI and rating records with schema
SR. For example, SU = huid, age, gender, state, cityi
and a user instance may be hu1, young, male, NY , NYCi.
Similarly, movies on IMDb can be described with SI =
hitem id, title, genre, directori, and the movie Titanic
Segment Discovery
[Amer-Yahia et al., WWW’2017]
Pattern and Cube Mining
[Xin et al., KDD’06]
[Kamat et al., ICDE’14]
Clustering and Partitioning
[Agrawal et al., ACM’1998]
[Pedreira et al., VLDB’16]
Cohort Representation
[Jiang et al., VLDB’16]
[Omidvar-Tehrani, Amer-Yahia,
Lakshmanan @ DSAA’18]
51. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Clustering algorithms can also be employed as commodity using high-level Python libraries.
• Among many successful libraries, scikit-learn is a popular and standard one.
• For k-means, given the data and the number of clusters, the library does the rest.
• For DB-Scan, given the data, the distance and the minimum number of users, the library does the rest.
Python libraries for clustering
44
# k-means
from sklearn.cluster import KMeans
import numpy as np
data = np.array([[1, 2], [1, 4], …)
clusters = KMeans(nb_clusters=2).fit(data)
print(clusters.labels_)
#[1, 1, 1, 0, 0, …]
print(clusters.predict([12, 3])
# 0
# DB-Scan
from sklearn.cluster import DBSCAN
import numpy as np
data = np.array([[1, 2], [2, 2], …)
clusters = DBSCAN(eps=3, min_samples=2).fit(data)
print(clusters.labels_)
# [ 0, 0, 0, 1, 1, -1, …]
predictions = clusters.fit_predict(new_data)
# 1
53. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
FIM: De
fi
nitions
46
• We are given a set of items , where any subset of is an itemset.
• We are also given a transaction (un-tidy) dataset where each member of is an itemset.
• Given an itemset , is the number of transactions containing .
• An itemset is a frequent itemset if , where is the minimum support threshold.
• Given two item sets and , an association rule with con
fi
dence holds, if ( is the
minimum con
fi
dence threshold ), where .
ℐ ℐ
𝒯𝒯
X ⊆ ℐ support(X) X
X ⊆ ℐ support(X) ≥ δ δ
X ⊆ ℐ Y ⊆ ℐ X → Y c c ≥ δ′

δ′

c = (support(X ∪ Y))/(support(X))
54. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
FIM: Example
47
User watched
u1
User watched
u2
User watched
u3
User watched
u3
User watched
u5
User watched
u6
Transaction user dataset
{The Terminal, Forrest Gump, The Pianist} is a
frequent itemset.
absolute support = 4
relative support = 4/5 = 60%
{Forrest Gump, The Pianist} → {The Terminal} is an
association rule.
con
fi
dence = 4/6 = 66%
{The Pianist} → {The Terminal, Forrest Gump} is
another association rule.
con
fi
dence = 4/5 = 80%
55. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
FIM: Computation
48
• Apriori algorithm. It is a level-wise search (
fi
rst 1-itemsets, then 2-itemsets, …) which exploits the following
pruning opportunity: if an itemset is not frequent, then all its supersets are not frequent.
• For instance, if {Psycho, Unhinged} is not frequent, then of course {Psycho, Unhinged, The Pianist} won’t be
frequent either.
• For instance, given the minimum support threshold equal to 2, the itemset {young, CA, student} is not frequent,
and not its superset either.
[Agrawal et al., SIGMOD’93]
6 Multi-Objective Group Discovery on the Social Web (Technical Report)
ha2, v2i, . . . , han, vni}, n k, we say that g covers r, denoted as r l g, i↵
8i 2 [1, n], 9r.vj such that vj is a set of values for attribute g.ai and g.vj ✓
r.vi. For example, the rating hfemale, DC, student, 4i is covered by the group
{hgender, femalei, hlocation, DCi}.
{}
#records=3662
{male, young}
#records=1588
{CA,
student}
#records=20
{male}
#records=2634
{young}
#records=2147
{CA}
#records=664
{student}
#records=184
{male, young,
CA}
#records=268
{male, young, CA, student}
#records=2
{young, CA}
#records=375
{male,
student}
#records=120
{male, CA}
#records=477
{young,
student}
#records=13
{young, CA,
student}
#records=2
{male, young,
student}
#records=13
{male, CA,
student}
#records=17
[Omidvar-Tehrani, Amer-Yahia, Dutot, Trystram @ PKDD’16]
56. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
FIM for mining describable groups of users
49
• We employ an ef
fi
cient implementation of Apriori called LCM for
mining groups in user data.
• Step 1. Identi
fi
ers for both users and items should be mapped to a
non-negative integer space (required by LCM). For instance if the
movie Titanic (as an item) is mapped to “25” and the user “John” is
also mapped to “120”, the tuple <120,25> means that John has
watched the movie Titanic.
• Step 2. We transform a tidy dataset to an un-tidy (transactional)
dataset, where each line represents one user and the whole item IDs
associated to the user will be listed in that line separated by space.
• Step 3. Run LCM to mine groups.
• Each line in the output
fi
le returned by LCM represents one group.
[Takeaki et al., Discovery Science ’04]
http://research.nii.ac.jp/~uno/code/lcm.html
57. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• With the approach discussed in the previous slides, we can obtain groups solely on the co-occurrence of items.
• It is more desirable to mix demographics and items to obtain groups such as “middle-aged females in Grenoble
who watched The Terminal and Forrest Gump.”
• It is possible to encode user attributes in the same transactional database. Then LCM will give us full-
fl
edged
groups.
Full-
fl
edged behaviors in user data mining
50
user gender age movies watched
u1 F Young Terminal, Forrest., Pianist, Psycho, Unhinged
u2 F Middle Terminal, Forrest., Pianist, Unhinged
u3 M Middle Pianist
u4 F Young Forrest., Pianist
u5 F Middle Terminal, Forrest., Pianist, Psycho
u6 M Middle Terminal, Forrest., Pianist
movie code
Terminal 1
Forrest. 2
Pianist 3
Psycho 4
Unhinged 5
attribute value code
Female 101
Male 102
Young 103
Middle 104
line # Transaction
1 1 2 3 4 5 101 103
2 1 2 3 5 101 104
3 3 102 104
4 2 3 101 103
5 1 2 3 4 101 104
6 1 2 3 102 104
[1 2 3 101 104] (2) [2 5]
[Terminal Forrest. Pianist Female Middle] (2) [u2 u5]
un-tidy LCM
translate
58. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Hands-on 3: Mining user groups
51
• Step 1. Find MovieLens 1M dataset dataset on a dataset repository and download. The dataset contains movies that
users appreciated watching. We only need the
fi
le ratings.dat.
• Step 2. Download the Python
fi
le pmr.py in the following link, complete it: https://drive.google.com/drive/folders/
1xMxGdcI2IGgTAhozDUqSfZAWzKVXfkjr?usp=sharing.
• Step 3. Run the code to obtain the output
fi
le pmr.txt.
• Step 4. Download LCM software from the following link: https://drive.google.com/drive/folders/
1xMxGdcI2IGgTAhozDUqSfZAWzKVXfkjr?usp=sharing.
• Step 5. Put the dataset
fi
le in the same folder as LCM.
• Step 6. Run LCM as follows:
• Step 7. Open the output
fi
le out.txt. Each line in the
fi
le out.txt represents a group in the following structure: [set of
items] (support) [set of users]. The description of the group is [set of items]. The set of group members is [set of users].
• Step 8. Try to
fi
nd 5 interesting user groups.
./lcm CfI -l 5 -u 100 pmr.txt 3 out.txt
59. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Question. Following the steps in the previous hands-on, what is the most challenging aspect of mining
groups which remains unsolved?
Poll: Challenge of mining user groups
52
Votes
0
1
2
3
4
Challenges of user data mining
Ef
fi
ciency Overlap Size of clusters Explainability Mechanism Binning
61. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Multi-objective optimization
54
• This makes a multi-objective optimization problem.
• Given set of ratings , identify all group-sets where each group-set satis
fi
es:
• is maximized;
• is maximized;
• is minimized;
• The problem is proved to be NP-Complete by a reduction from the Exact 3-Set Cover problem (EC3).
R G
coverage(G, R)
diversity(G, R)
diameter(G, R)
Ensuring that most input records belong to at least one group in the output.
Ensuring that found groups are as different as possible from each other.
Ensuring that ratings within each group are homogenous.
[Omidvar-Tehrani, Amer-Yahia, Dutot, Trystram @ PKDD’16]
62. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Diameter objective
55
• Diameter is a simple but effective measure of variance in ratings.
• Below, we observe that most reviewers agree on a high score for the movie Godfather → minimum diameter.
• We also observe that the reviewers are divided when voting on Fifty Shades of Grey → maximum diameter.
Count
(%)
0
15
30
45
60
Rating scores
1 2 3 4 5 6 7 8 9 10
Rating Distribution
Other rating distributions like increasing, decreasing, heterogeneous, etc.
Rating distribution of
The Godfather (1972)
in IMDb
Homogeneous
Rating Distribution
Minimum diameter
Count
(%)
0
7.5
15
22.5
30
Rating Scores
1 2 3 4 5 6 7 8 9 10
Rating distribution of
Fifty Shades of Grey (2015)
in IMDb
Polarized Rating
Distribution
Maximum diameter
Count
(%)
0
15
30
45
60
Rating scores
1 2 3 4 5 6 7 8 9 10
Rating Distribution
Other rating distributions like increasing, decreasing, heterogeneous, etc.
Rating distribution of
The Godfather (1972)
in IMDb
Homogeneous
Rating Distribution
Minimum diameter
Count
(%)
0
7.5
15
22.5
30
Rating Scores
1 2 3 4 5 6 7 8 9 10
Rating distribution of
Fifty Shades of Grey (2015)
in IMDb
Polarized Rating
Distribution
Maximum diameter
63. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Pareto group discovery
56
• A bottom-up exhaustive approach to discover Pareto front.
• Generating fewer plans makes a Multi-Objective optimization algorithm run faster.
Optimization-based User Group Management: Discovery,Analysis, Recommendation - November 6, 2015
Bottom-up exhaustive approach to discover Pareto front.
0.5
10
User Groups as Pareto Fronts
Diversity
0 1
0.5
Coverage
0
1
Candidate Group-set
Dominance Area
Rejected Group-set
Pareto Group-set
α-Dominance Area
α
Rejected Group-set in case of α-
dominance
Bottom-up exhaustive approach to discover Pareto front.
0.5
User Groups as Pareto Fronts
Diversity
0 1
0.5
Coverage
0
1
Candidate Group-set
Dominance Area
Rejected Group-set
Pareto Group-set
α-Dominance Area
α
Rejected Group-set in case of α-
dominance
64. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
An approximation algorithm for Pareto group discovery
57
1. Inputs are , ,
2. Output is the Pareto result set
3.
4. For all user groups do
1. ← Singleton group-set containing g
2. If is not -dominated by any other group-set , then add to
5. For do
1. For each possible group-set of size do
1. If is not -dominated by any other group-set , then add to
6. Return
k α > 1 R
𝒫𝒫
← ∅
g
G
G α ∈
𝒫
G
𝒫
n ∈ [2,k]
G n
G α ∈
𝒫
G
𝒫𝒫
65. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Hands-on 4: Mining multi-objective user groups
58
• Step 0. We continue the previous hands-on. So we need the
mined groups.
• Step 1. Download and unzip the
fi
le MOMRI.zip at the
following https://drive.google.com/drive/folders/1M-
HlNao9tYwqN0imeZ-SzHnGZKMoJgh4?usp=sharing. It is
a Java NetBeans project whose main package is
“MOQO.MRI” and whose main executable is MOMRI.java.
• Step 2. Run the algorithm. The output of the algorithm
reports the progress in
fi
nding Pareto plans.
• Step 3. Add a new objective to the optimizer.
• Download the documentation at https://drive.google.com/
fi
le/d/1BE1jL2Lp327_Lxb1MMudY2p6l1tG_Uj4/view?
usp=sharing.
Input data. The parameter “ds” (line 21 of MOMRI.java) specifies the name of the da
use. MovieLens 1M (ds=“ml1m”) is considered as the default dataset. You can also t
MovieLens 100K dataset (ds=“ml100k”). The method “read ratings()” in line 30 of
MOMRI.java reads ratings from the data file on disk. The data file is hosted in the “da
Executable file
Parameters
Output
66. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Recommendation systems
59
• Recommendation systems are designed to automatically
fi
nd relevant and desirable items to be consumed
by users in the future.
• In general, those systems work by means of predicting items that are likely to be the most appealing to
users based on their preferences.
• Intuitively, the problem of recommendation reduces to
fi
lling missing values in the user-item interaction
matrix.
[Amer-Yahia and Benouaret, BigData’20]
Terminal Forrest. Pianist Psycho Unhinged
u1 5 4 5 4 3
u2 4 5 5
u3 4
u4 3 3
u5 3 2 3 2
u6 3 4 2
Question. How would u2 rate the movie Psycho in the future?
Answer. Probably like others users similar to u2, like u1 or u5.
Question. Is u2 more similar to u1 or u5?
Answer. Following their ratings for The Terminal, Forrest Gump,
and The Pianist, u2 is more similar to u1. Hence u2 would
probably rate Psycho around 4, like what u1 did.
Multi-scale rating user-item interaction matrix
67. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Types of recommendation systems
60
• Rule-based approaches used to be the dominant method for recommendation. It is still used in industry.
• Most common state-of-the-art approaches are content-based
fi
ltering and collaborative
fi
ltering.
• Content-based
fi
ltering recommends items based on ones that the user liked before.
• Collaborative
fi
ltering recommend items which are popular among the neighbors of the user.
Nina likes 60 drama movies, 20 romance, and 20 action.
La Vie en Rose
(Biography, Drama)
Me before You
(Drama, Romance)
Memento
(Mystery, Thriller)
60% sim. 0% sim.
80% sim.
“Me before You” will be ranked higher than “La Vie en
Rose” in Nina’s content-based recommendation.
Nina’s taste overlaps with Stephanie and Charles.
more impact less impact
“La Vie en Rose” will be ranked higher than “Memento” in
Nina’s collaborative recommendation.
CONTENT-BASED
COLLABORATIVE
Stephanie has the same taste
as Nina and likes “La Vie en
Rose” more than “Me before
You”.
Charles ’s taste is somewhat
different form Nina’s, and he
likes “Memento” more than
“Me before You”.
68. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Collaborative
fi
ltering
61
• As collaborative
fi
ltering (CF) captures “like-minded behaviors”, it is often a favorite recommendation option.
• Two methods are proposed for implementing a CF approach: memory-based and model-based.
• In a memory-based implementation, the entire user-item interaction matrix is employed.
• In a model-based implementation, a model of users is developed to learn their preferences.
Towards more simplicity
Towards more ef
fi
ciency
model-based memory-based
69. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Similarity between users
62
• An important step in recommendation is to to compare all users to the input user and
fi
nd the one that is most
similar.
• This is done using Pearson correlation.
• To measure the similarity between the tastes of Sara and Anderson, let’s assume x is the taste vector fo Sara and
y is Anderson’s, both rating n movies.
• The value r could be in the range -1 to +1, where +1 means that Sara and Anderson have perfectly similar tastes,
and -1 means the opposite.
• In practice, this correlation cannot be computed for any single user, hence we often user a small sample.
70. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Memory-based CF: user-based
• Common implementations are user-based and item-based. We practice the former.
63
import pandas as pd
import numpy as np
movies_df, ratings_df = read_data(…)
user_preferences = pd.DataFrame()
user_subset = ratings_df[ratings_df["movie_id"].isin(user_preferences["movie_id"].tolist())]
user_Subset_group = userSubset.groupby(["user_id"])
user_Subset_group = sorted(user_subset_group, key=lambda x: len(x[1]), reverse=True)
user_subset_group = user_subset_group[0:100]
pearson_correlation_dict = {}
for name, group in user_subset_group:
pearson_correlation_dict[name] = pearson_correlation(user_preferences, group)
top_users = pearson_correlation_dict.sort()[0:50].merge(ratings_df)
top_users_rating["weighted_rating"] = top_users_rating["sim"] * top_users_rating["rating"]
recommendation_df = top_users_rating.groupby("movie_id").sum()[["sim","weighted_rating"]]
recommendation_df.average().sort()
final_rec = movies_df.loc[movies_df["movieId"].isin(recommendation_df.head(10)["movieId"].tolist())]
72. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Model-based CF: matrix factorization
65
• CF as a “neighborhood” method, focusing on maximizing “closeness”, does not handle scalability issues and
noise.
• CF performs on low-level (raw) data which does not capture well the similarities between users on higher
levels.
• Matrix Factorization is a solution for both aforementioned issues.
• Factorization is a simple but principle operator in mathematics, e.g., representing “12” with its factors, which
are “4’ and “3”.
• In the context of recommendation, it is the task of factorizing the user-item interaction matrix into two
matrices corresponding to users and items.
74. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Model-based CF with SVD
• To get the lower rank approximation, we employ SVD and maintain the top k latent features, which are the most
important underlying taste.
• For illustration purposes, we consider k = 2, but k ~ 50 is more natural.
67
import pandas as pd
import numpy as np
from scipy.sparse.linalg import svds
# step 1
ratings_df, users_df, movies_df = get_data(…)
# step 2
ratings_pivot_df = ratings_df.pivot()
U, sigma, Vt = svds(ratings_pivot_df, k = 2)
sigma = np.diag(sigma)
# step 3
predictions = np.dot(np.dot(U, sigma))
Terminal Forrest. Pianist
u1 4.5 3 ??
u2 5 5 2
Step 1 (original dataset)
f1 f2
u1 1.1 2.3
u2 2.1 1
f1 f2
Terminal 1.9 1
Forrest. 2.3 0
Pianist 0 2
Terminal Forrest. Pianist
f1 1.9 2.3 0
f2 1 0 2
Matrix U Matrix V Matrix Vt
Terminal Forrest. Pianist
u1 4.39 2.53 4.6
u2 4.99 4.83 2
Step 3 (reconstructed dataset)
Step
2
75. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Deep learning for recommendation
68
• So far, we covered neighborhood and matrix factorization methods for recommendation.
• For more ef
fi
ciency and precision, we also look at deep approaches, i.e., the active trend in recommendation.
• Deep learning has hunger for data, hence we often user implicit-feedback data rather than explicit-feedback.
Terminal Forrest. Pianist Psycho Unhinged
u1 5 4 5 4 3
u2 4 5 5
u3 4
u4 3 3
u5 3 2 3 2
u6 3 4 2
Explicit-feedback interaction matrix
Terminal Forrest. Pianist Psycho Unhinged
u1 1 1 1 1 1
u2 1 1 1 0 0
u3 0 0 1 0 0
u4 0 1 1 0 0
u5 1 1 1 1 0
u6 1 1 1 0 0
Implicit-feedback interaction matrix
76. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Neural Collaborative Filtering (NCF)
69
• We employ a simple but ef
fi
cient implementation of a deep neural network for recommendation, called Neural
Collaborative Filtering (NCF).
import pandas as pd
import numpy as np
import torch.nn as nn
ratings = read_data()
# make the algorithm scalable
ratings = filter_to(ratings, 0.1)
train_ratings, test_ratings = split_train_test(ratings)
# mark all seen data as “1” and …
# … pick a few negative examples
users, items, labels = make_implicit_data(train_ratings)
model = NCF(num_users, num_items, train_ratings, movies)
trainer = trainer(max_epochs=5)
trainer.fit(model)
[He et al. ArXiv’17]
77. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Neural Collaborative Filtering (NCF)
69
• We employ a simple but ef
fi
cient implementation of a deep neural network for recommendation, called Neural
Collaborative Filtering (NCF).
import pandas as pd
import numpy as np
import torch.nn as nn
ratings = read_data()
# make the algorithm scalable
ratings = filter_to(ratings, 0.1)
train_ratings, test_ratings = split_train_test(ratings)
# mark all seen data as “1” and …
# … pick a few negative examples
users, items, labels = make_implicit_data(train_ratings)
model = NCF(num_users, num_items, train_ratings, movies)
trainer = trainer(max_epochs=5)
trainer.fit(model)
# step 1
random_users = np.random.choice(ratings['user_id'].unique(),
size=int(len(ratings['user_id’].unique()) * 0.1), replace=False)
# step 2
ratings = ratings.loc[ratings[‘user_id'].isin(random_users)]
[He et al. ArXiv’17]
78. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Neural Collaborative Filtering (NCF)
69
• We employ a simple but ef
fi
cient implementation of a deep neural network for recommendation, called Neural
Collaborative Filtering (NCF).
import pandas as pd
import numpy as np
import torch.nn as nn
ratings = read_data()
# make the algorithm scalable
ratings = filter_to(ratings, 0.1)
train_ratings, test_ratings = split_train_test(ratings)
# mark all seen data as “1” and …
# … pick a few negative examples
users, items, labels = make_implicit_data(train_ratings)
model = NCF(num_users, num_items, train_ratings, movies)
trainer = trainer(max_epochs=5)
trainer.fit(model)
# step 1
random_users = np.random.choice(ratings['user_id'].unique(),
size=int(len(ratings['user_id’].unique()) * 0.1), replace=False)
# step 2
ratings = ratings.loc[ratings[‘user_id'].isin(random_users)]
# step 1
ratings['rank_latest'] = ratings.groupby(['user_id'])
['timestamp'].rank(method='first', ascending=False)
# step 2
train_ratings = ratings[ratings['rank_latest'] != 1]
test_ratings = ratings[ratings['rank_latest'] == 1]
[He et al. ArXiv’17]
80. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Group recommendation
71
• The outcome of a typical recommendation engine is a personalized top-k recommendation list.
• What if a group of users want to receive recommendations that they all appreciate collectively?
• A naïve approach towards group recommendation is the creation of a virtual user.
Predictions for Olivia:
rating(“Me before You”) = 1
rating(“Memento”) = 3
Predictions for Julia:
rating(“Me before You”) = 1
rating(“Memento”) = 1
Julia
Olivia Jacob
Predictions for Jacob:
rating(“Me before You”) = 5
rating(“Memento”) = 3
Question. Which movie should the group watch together?
Answer. Consider them as a virtual user with average rating.
Question. The average for both movies will become 2.33!! Alternative?
Answer. Consider them as a virtual user with least misery.
Question. The least misery score for both is 1!! Alternative?
Answer. …!
81. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Solving the group recommendation problem
72
• Problem. Given user group , return best items to recommend (denoted as ) to during period such that
• contains items.
• Every item in is new to all members of .
• There does not exist any other item whose score is higher than any item in .
• Solution. A top-k processing algorithm is proposed.
• We materialize lists such as static af
fi
nity, absolute preference and dynamic af
fi
nity, and then scan all lists in
round-robin fashion (like NRA) followed by a buffer update.
• We terminate using a stopping condition.
G k IG G p
IG k
IG G
IG
[Basu Roy et al., VLDBJ’10 and ICDE’14]
82. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
Top-k processing
73
• Top- processing is a series of algorithms with the aim of
fi
nding items that best answer a user’s query.
• The performance of a top-k processing algorithm is measured in
terms of number of sequential accesses (SAs) and random accesses
(RAs) it makes.
• For instance, you access your third favorite music on an audio tape
using an SA, and on Spotify (or essentially or hard drive) using an RA.
• The naïve computation of top-k is to compute the score of each item,
sort them in decreasing order, and return the top-k. When we have billions of items, this approach is infeasible.
• An alternative idea is to throw space at the problem, by pre-computing inverted lists and scanning them, with a
stopping condition.
• Famous algorithms in this genre are TA and NRA. We review the latter here.
k k
83. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
No-Random-Access (NRA) algorithm
74
• Access all lists sequentially and in parallel.
• After each cursor move compute
• Worst-case score , best-case score for each seen ( is an item, e.g., a movie or a book)
• Sort all seen items on ,breaking ties by
• if then
• add to buffer
•
• else if
• add to candidates
• Stop if candidates
• Return the top- items
W(r) B(r) r r
W(r) B(r)
W(r) > mink
r
mink = min(W(r′

) ∀r′

∈ B)
B(r) > mink
r
B(d′

) ≤ mink ∀d′

∈
k
Predictions for
Julia
Titanic, 1
Terminal, 0.2
Predictions for
Jacob
God Father, 3.3
Titanic, 1.4
Predictions for
Olivia
Titanic, 2.3
God Father, 0.1
…
1
2
1
2 …
Sequential
access
(SA)
Random access (RA)
84. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 1)
75
• We initialize cursors at the head of each list. We assume (hence the buffer size) and we have
space to keep track of 10 candidates.
k = 3
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
0 SA’s have been performed hitherto.
mink = ?
[jump to end of this example]
85. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 2)
76
• We move the cursors sequentially.
• We complete the buffer by adding movies r7, r1, and r2 to it.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
3 SA’s have been performed hitherto.
mink = ?
86. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 2)
76
• We move the cursors sequentially.
• We complete the buffer by adding movies r7, r1, and r2 to it.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
3 SA’s have been performed hitherto.
mink = ?
87. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 2)
76
• We move the cursors sequentially.
• We complete the buffer by adding movies r7, r1, and r2 to it.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
3 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r7 5 6.5
2 r1 1.5 6.5
3 r2 1.5 6
Candidates
4
5
6
7
8
9
10
mink = ?
88. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 2)
76
• We move the cursors sequentially.
• We complete the buffer by adding movies r7, r1, and r2 to it.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
3 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r7 5 6.5
2 r1 1.5 6.5
3 r2 1.5 6
Candidates
4
5
6
7
8
9
10
mink = ?
mink = 1.5
89. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 3)
77
• Once the buffer is complete, we check whether a new movie is worth to be added to the buffer.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
4 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r7 5 6.5
2 r1 1.5 6.5
3 r2 1.5 6
Candidates
4
5
6
7
8
9
10
mink = ?
mink = 1.5
90. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 3)
77
• Once the buffer is complete, we check whether a new movie is worth to be added to the buffer.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
4 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r7 5 6.5
2 r1 1.5 6.5
3 r2 1.5 6
Candidates
4
5
6
7
8
9
10
mink = ?
mink = 1.5
91. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 3)
77
• Once the buffer is complete, we check whether a new movie is worth to be added to the buffer.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
4 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r7 5 6.5
2 r1 1.5 6.5
3 r2 1.5 6
Candidates
4
5
6
7
8
9
10
mink = ?
mink = 1.5
Given and
, should it be added to
the buffer?
worsecase(r3) = 4.5
bestcase(r3) = 6
92. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 3)
77
• Once the buffer is complete, we check whether a new movie is worth to be added to the buffer.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
4 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r7 5 6.5
2 r1 1.5 6.5
3 r2 1.5 6
Candidates
4
5
6
7
8
9
10
mink = ?
mink = 1.5
Given and
, should it be added to
the buffer?
worsecase(r3) = 4.5
bestcase(r3) = 6
Given that , then
YES!
worsecase(r3) > mink
93. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 4)
78
• Some items will gradually transition from the buffer to the candidates (e.g., r2).
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
5 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r7 5 6.5
2 r3 4.5 6
3 r1 1.5 6.5
Candidates
4 r2 1.5 6
5
6
7
8
9
10
mink = ?
mink = 1.5
94. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 4)
78
• Some items will gradually transition from the buffer to the candidates (e.g., r2).
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
5 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r7 5 6.5
2 r3 4.5 6
3 r1 1.5 6.5
Candidates
4 r2 1.5 6
5
6
7
8
9
10
mink = ?
mink = 1.5
95. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 4)
78
• Some items will gradually transition from the buffer to the candidates (e.g., r2).
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
5 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r7 5 6.5
2 r3 4.5 6
3 r1 1.5 6.5
Candidates
4 r2 1.5 6
5
6
7
8
9
10
mink = ?
mink = 1.5 gets updated but stays at 1.5.
mink
96. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 5)
79
• We have to check the stopping condition after each SA.
• We stop if .
max(bestcase(candidates)) < mink
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
5 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r7 5 6.5
3 r1 1.5 6.5
Candidates
4 r2 1.5 6
5
6
7
8
9
10
mink = ?
mink = 1.5
max
97. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 5)
79
• We have to check the stopping condition after each SA.
• We stop if .
max(bestcase(candidates)) < mink
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
5 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r7 5 6.5
3 r1 1.5 6.5
Candidates
4 r2 1.5 6
5
6
7
8
9
10
mink = ?
mink = 1.5
max
⚠ Should we stop? 6 1.5, then NO!
≮
98. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 6)
80
• For any new movie, we check if it should be added to the buffer. After the buffer update, we check the
stopping condition.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
6 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r7 5 6.5
3 r1 1.5 6.5
Candidates
4 r2 1.5 6
5
6
7
8
9
10
mink = ?
mink = 1.5
⚠ Should we stop? 6 1.5, then NO!
≮
99. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 6)
80
• For any new movie, we check if it should be added to the buffer. After the buffer update, we check the
stopping condition.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
6 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r7 5 6.5
3 r1 1.5 6.5
Candidates
4 r2 1.5 6
5
6
7
8
9
10
mink = ?
mink = 1.5
⚠ Should we stop? 6 1.5, then NO!
≮
100. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 6)
80
• For any new movie, we check if it should be added to the buffer. After the buffer update, we check the
stopping condition.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
6 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r7 5 6.5
3 r1 1.5 6.5
Candidates
4 r2 1.5 6
5
6
7
8
9
10
mink = ?
mink = 1.5
Given and
, should it be added to
the buffer?
worsecase(r4) = 4.5
bestcase(r4) = 5.75
⚠ Should we stop? 6 1.5, then NO!
≮
101. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 6)
80
• For any new movie, we check if it should be added to the buffer. After the buffer update, we check the
stopping condition.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
6 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r7 5 6.5
3 r1 1.5 6.5
Candidates
4 r2 1.5 6
5
6
7
8
9
10
mink = ?
mink = 1.5
Given and
, should it be added to
the buffer?
worsecase(r4) = 4.5
bestcase(r4) = 5.75
Given that , then
YES!
worsecase(r4) > mink
⚠ Should we stop? 6 1.5, then NO!
≮
102. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 7)
81
• We update after any buffer update.
mink
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
6 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r7 5 6.5
3 r4 4.5 5.75
Candidates
4 r1 1.5 6.5
5 r2 1.5 6
6
7
8
9
10
mink = ?
mink = 1.5
103. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 7)
81
• We update after any buffer update.
mink
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
6 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r7 5 6.5
3 r4 4.5 5.75
Candidates
4 r1 1.5 6.5
5 r2 1.5 6
6
7
8
9
10
mink = ?
mink = 1.5
mink = 4.5
104. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
NRA example (step 7)
81
• We update after any buffer update.
mink
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list
position movie worse-case score best-case score
Buffer
1
2
3
Candidates
4
5
6
7
8
9
10
6 SA’s have been performed hitherto.
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r7 5 6.5
3 r4 4.5 5.75
Candidates
4 r1 1.5 6.5
5 r2 1.5 6
6
7
8
9
10
mink = ?
mink = 1.5
mink = 4.5
⚠ Should we stop? 6.5 4.5, then NO!
≮
105. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r7 5 6.5
3 r4 4.5 5.75
Candidates
4 r1 1.5 6.5
5 r2 1.5 6
6
7
8
9
10
NRA example (step 8)
82
• We move the cursors sequentially.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 7 SA’s have been performed hitherto.
mink = ?
mink = 5
⚠ Should we stop? 6.5 5, then NO!
≮
106. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r7 5 6.5
3 r4 4.5 5.75
Candidates
4 r1 1.5 6.5
5 r2 1.5 6
6
7
8
9
10
NRA example (step 8)
82
• We move the cursors sequentially.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 7 SA’s have been performed hitherto.
mink = ?
mink = 5
⚠ Should we stop? 6.5 5, then NO!
≮
107. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r7 5 6.5
3 r4 4.5 5.75
Candidates
4 r1 1.5 6.5
5 r2 1.5 6
6
7
8
9
10
NRA example (step 8)
82
• We move the cursors sequentially.
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 7 SA’s have been performed hitherto.
mink = ?
mink = 5
⚠ Should we stop? 6.5 5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r4 5.5 5.5
3 r7 5 6.5
Candidates
4 r1 1.5 6.5
5 r2 1.5 6
6
7
8
9
10
108. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 9)
83
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 8 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 1.5 6.5
6
7
8
9
10
109. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 9)
83
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 4.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 8 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 1.5 6.5
6
7
8
9
10
110. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Some movies may not be worth to be added neither to the buffer nor to the candidates.
NRA example (step 10)
84
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 9 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 1.5 6.5
6
7
8
9
10
111. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Some movies may not be worth to be added neither to the buffer nor to the candidates.
NRA example (step 10)
84
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 9 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 1.5 6.5
6
7
8
9
10
112. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Some movies may not be worth to be added neither to the buffer nor to the candidates.
NRA example (step 10)
84
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 9 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 1.5 6.5
6
7
8
9
10
Given and
, should it be added to
the buffer?
worsecase(r5) = 1
bestcase(r5) = 5.25
113. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Some movies may not be worth to be added neither to the buffer nor to the candidates.
NRA example (step 10)
84
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 9 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 1.5 6.5
6
7
8
9
10
Given and
, should it be added to
the buffer?
worsecase(r5) = 1
bestcase(r5) = 5.25
Given that , then
NO!
worsecase(r5) ≯ mink
114. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Some movies may not be worth to be added neither to the buffer nor to the candidates.
NRA example (step 10)
84
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 9 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 1.5 6.5
6
7
8
9
10
Given and
, should it be added to
the buffer?
worsecase(r5) = 1
bestcase(r5) = 5.25
Given that , then
NO!
worsecase(r5) ≯ mink
Can we still keep it as a candidate?
115. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• Some movies may not be worth to be added neither to the buffer nor to the candidates.
NRA example (step 10)
84
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 9 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 1.5 6.5
6
7
8
9
10
Given and
, should it be added to
the buffer?
worsecase(r5) = 1
bestcase(r5) = 5.25
Given that , then
NO!
worsecase(r5) ≯ mink
Can we still keep it as a candidate? Given that , then NO!
bestcase(r5) ≯ mink
116. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 11)
85
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 10 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10
117. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 11)
85
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 10 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10
118. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 12)
86
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 11 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10
119. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 12)
86
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 11 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10
120. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 12)
86
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 11 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10
Given and
, should it be added to
the buffer?
worsecase(r6) = 0.8
bestcase(r6) = 4.3
121. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 12)
86
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 11 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10
Given and
, should it be added to
the buffer?
worsecase(r6) = 0.8
bestcase(r6) = 4.3
Given that , then
NO!
worsecase(r6) ≯ mink
122. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 12)
86
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 11 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10
Given and
, should it be added to
the buffer?
worsecase(r6) = 0.8
bestcase(r6) = 4.3
Given that , then
NO!
worsecase(r6) ≯ mink
Can we still keep it as a candidate?
123. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 12)
86
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 11 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10
Given and
, should it be added to
the buffer?
worsecase(r6) = 0.8
bestcase(r6) = 4.3
Given that , then
NO!
worsecase(r6) ≯ mink
Can we still keep it as a candidate? Given that , then NO!
bestcase(r6) ≯ mink
124. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 13)
87
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 12 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10
125. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 13)
87
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 12 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10
126. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 13)
87
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 12 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10
We won’t add the movie r9 neither to
buffer nor to the candidates.
127. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 14)
88
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 13 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10
128. Exploratory Analysis of User Data: 1st RAIS Summer School May 2021, Online Event
• We move the cursors sequentially.
NRA example (step 14)
88
movie
predicted
rating
r1 1.5
r2 1.5
r3 1.25
r4 1
r5 1
r6 0.8
r7 0.75
r8 0.75
r9 0.5
r10 0.5
Jacob’s inverted list
movie
predicted
rating
r7 5
r3 4.5
r4 4.5
r2 5.25
r1 3.5
r9 3
r5 2
r6 1
r8 0.5
r10 0.5
Julia’s inverted list 13 SA’s have been performed hitherto.
mink = ?
mink = 5.5
⚠ Should we stop? 6.5 5.5, then NO!
≮
position movie worse-case score best-case score
Buffer
1 r3 5.75 5.75
2 r2 5.75 5.75
3 r4 5.5 5.5
Candidates
4 r7 5 6.5
5 r1 5 5
6
7
8
9
10