SlideShare a Scribd company logo
Topological approach to Wikipedia
article recommendation
Author:
Maksym Opirskyi
opirskyi@ucu.edu.ua
Supervisor:
Petro Sarkanych
sarkanyp@coventry.ac.uk
1. Institute for Condensed Matter Physics: Lviv, UA
2. L4 Collaboration & Doctoral College for the Statistical Physics of
Complex Systems, Leipzig-Lorraine-Lviv-Coventry, Europe
3. Ukrainian Catholic University
Plan
● Introduction and motivation
● Problem statement and research questions
● Related work
● Data
● Proposed method
● Evaluation and results
● Conclusion
2
Introduction (Information as a network)
3
Wikipedia article recommendation
Source: https://www.wikipedia.org/
4
Source: https://www.flickr.com/photos/sjcockell/8425835703
Motivation
Source: https://en.wikipedia.org/wiki/Programming_language_theory#See_also
● Manual search is hard
● Millions of articles
● No “See also” on each
article
● “See also” are created
manually
● Enhancement of
information search
5
Problem formulation and research gap
● How to find and list top k articles related to the given one?(ranking
problem)
6
Problem formulation and research gap
● How to find and list top k articles related to the given one?(ranking
problem)
● There is no such a tool currently
7
Problem formulation and research gap
● How to find and list top k articles related to the given one?(ranking
problem)
● There is no such a tool currently
● Tools that enhance the experience with Wikipedia use neither
topological information of the Wiki graph nor history of human
navigation on it
8
Research Questions
● Does topology reflect the relatedness of articles?
9
Research Questions
● Does topology reflect the relatedness of articles?
● Is this information sufficient? If not, do the navigation information
enrich this information and contribute to the goodness of the
recommendation?
10
Research Questions
● Does topology reflect the relatedness of articles?
● Is this information sufficient? If not, do the navigation information
enrich this information and contribute to the goodness of the
recommendation?
● Which of these options - solely structural information or navigational
information - is more important for good recommendation?
11
Related work
● Navigation in information networks
● Patterns in human navigation
● Random walks
● Recommender systems
12
Navigation in information networks
● Kleinberg: (some) networks are efficiently navigable!
● Semantic relatedness based on game paths (West et al.)
13
Patterns in search strategies
● Move to hub, then zoom in pattern (West and Leskovec)
● Back clicking patterns (West and Leskovec)
● Article structure matters (Lamprecht et al.)
14
Random walks
Source: https://www.wikiwand.com/en/Random_walk
15
Source:
https://www.slideshare.net/pavankapanipathi/random-walk-on-graphs
Recommender systems
● Semantic relatedness of Wikipedia articles using personalized
PageRank (Yeh et al.)
● Random walk based recommender system for movie
recommendation (Bogers)
16
Data
● Wikispeedia graph (condensed
version of Wikipedia):
○ Articles - 4,604
○ Links - 119,882
● Collection of players’ game paths
(on Wikispeedia graph):
○ Finished paths - 51,318
○ Unfinished paths - 24,875
● Plain text of Wikispeedia articles
~94 MiB
● Wikipedia dump (raw) ~5.7 GiB (for
obtaining ground truth
recommendations)
Source:
http://allthingsgraphed.com/2015/09/16/what-is-transhumanis
m-wikipedia/
17
Wikispeedia graph
● Average shortest path is ~3.2
● Degree distribution
resembles power-law
● Average clustering
coefficient is 0.11
(corresponding Erdos-Renyi
graph: 0.0056)
● Global clustering coefficient
is 0.1 (corresponding
Erdos-Renyi network: 0.011)
18
Proposed method
● Random walk agents
● Recommendation prediction
19
Random walk agents
● Random walk agents
○ Options:
■ ,
■ ,
■ , where is number of player transitions from node i to node j
■ , where , is the number of outgoing links of node i
20
Recommendation prediction
● Obtain ground truth recommendations for articles.
● Compute metrics from previous slide for each pair (article,
recommendation). These are positive examples.
● For each article that has ground truth recommendations, randomly
sample articles that are not recommendations for the current one
and compute metrics for them. These are negative examples.
● Assign corresponding labels, train binary classifier.
21
Agent results, distance
22
Agent results, visitance vs degree
23
Agent results, similarity
24
Evaluation
● Train/Test split and predict recommended entities of the test set
● Metrics:
○ MAP@k,
○ V@k, normalized total reciprocal rank,
25
Classification results
26
Metrics significance
27
Conclusion
● Information encoded in human trails is useful in recommendation of
similar articles
● Simple structural information like degree(uniform agent) produces
second best results
● Long vs short walks -> what is better?
● Similarity is not changing with time on average and is less than
similarity between article and its true recommendations
● None of the considered transition probabilities produces very good
recommendation, however their linear combination works much
better in the context of prediction
28
Contribution
● Designed metrics (transition probabilities) for random walk agents
● Implemented recommender system based on the agents’ walks
● Discovered that walks are local and that similarity is stable
● Proposed method to combine metrics (linear combination)
● Designed the procedure for inferring metric importances
29
Thank you! Questions?
30
Recommendation examples
Path based:
○ United States
○ Europe
○ United Kingdom
○ England
○ Earth
○ World War II
○ France
○ Africa
○ Germany
○ North America
Uniform:
○ United States
○ Europe
○ United Kingdom
○ France
○ Latin
○ India
○ World War II
○ Germany
○ English language
○ Spain
Pearson:
○ Harry Potter
○ Das Kapital
○ Cambodia
○ Sistine Chapel ceiling
○ Las Vegas, Nevada
○ Palm oil
○ Sigmund Freud
○ Battle of Marathon
○ Hinduism
○ Afghanistan
Tf-idf (alpha = 2):
○ Europe
○ England
○ Judaism
○ Italy
○ Turkey
○ Spain
○ Bermuda
○ Malta
○ Geography
○ Tungsten
Query: Monarchy, top 10 most occuring recommendations
31

More Related Content

Similar to Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Article Recommendation

Document
DocumentDocument
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
semanticsconference
 
DBtrends Semantics 2016
DBtrends Semantics 2016DBtrends Semantics 2016
DBtrends Semantics 2016
Edgard Marx
 
Tutorial 7 (link analysis)
Tutorial 7 (link analysis)Tutorial 7 (link analysis)
Tutorial 7 (link analysis)
Kira
 
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
YONG ZHENG
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
Carlos Castillo (ChaTo)
 
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Dr. Cornelius Ludmann
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
Yongyao Jiang
 
Phd thesis final presentation
Phd thesis   final presentationPhd thesis   final presentation
Phd thesis final presentation
Cristhian Figueroa
 
Introduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodIntroduction to Systematic Literature Review method
Introduction to Systematic Literature Review method
Norsaremah Salleh
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
YONG ZHENG
 
Recsys14 int rs_vassileva
Recsys14 int rs_vassilevaRecsys14 int rs_vassileva
Recsys14 int rs_vassileva
Julita Vassileva
 
SelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringSelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question Answering
Jinho Choi
 
A task-based scientific paper recommender system for literature review and ma...
A task-based scientific paper recommender system for literature review and ma...A task-based scientific paper recommender system for literature review and ma...
A task-based scientific paper recommender system for literature review and ma...
Aravind Sesagiri Raamkumar
 
ACM ICTIR 2019 Slides - Santa Clara, USA
ACM ICTIR 2019 Slides -  Santa Clara, USAACM ICTIR 2019 Slides -  Santa Clara, USA
ACM ICTIR 2019 Slides - Santa Clara, USA
Iadh Ounis
 
MMR.pdf
MMR.pdfMMR.pdf
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
Rakebul Hasan
 
Towards designing and evaluating future library information systems example o...
Towards designing and evaluating future library information systems example o...Towards designing and evaluating future library information systems example o...
Towards designing and evaluating future library information systems example o...
Tanja Merčun
 
Data analytics to support awareness and recommendation
Data analytics to support awareness and recommendationData analytics to support awareness and recommendation
Data analytics to support awareness and recommendation
Katrien Verbert
 
Semantic Need: Guiding Metadata Annotations by Questions People #ask
Semantic Need: Guiding Metadata Annotations by Questions People #askSemantic Need: Guiding Metadata Annotations by Questions People #ask
Semantic Need: Guiding Metadata Annotations by Questions People #ask
Hans-Joerg Happel
 

Similar to Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Article Recommendation (20)

Document
DocumentDocument
Document
 
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
 
DBtrends Semantics 2016
DBtrends Semantics 2016DBtrends Semantics 2016
DBtrends Semantics 2016
 
Tutorial 7 (link analysis)
Tutorial 7 (link analysis)Tutorial 7 (link analysis)
Tutorial 7 (link analysis)
 
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
 
Phd thesis final presentation
Phd thesis   final presentationPhd thesis   final presentation
Phd thesis final presentation
 
Introduction to Systematic Literature Review method
Introduction to Systematic Literature Review methodIntroduction to Systematic Literature Review method
Introduction to Systematic Literature Review method
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
 
Recsys14 int rs_vassileva
Recsys14 int rs_vassilevaRecsys14 int rs_vassileva
Recsys14 int rs_vassileva
 
SelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringSelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question Answering
 
A task-based scientific paper recommender system for literature review and ma...
A task-based scientific paper recommender system for literature review and ma...A task-based scientific paper recommender system for literature review and ma...
A task-based scientific paper recommender system for literature review and ma...
 
ACM ICTIR 2019 Slides - Santa Clara, USA
ACM ICTIR 2019 Slides -  Santa Clara, USAACM ICTIR 2019 Slides -  Santa Clara, USA
ACM ICTIR 2019 Slides - Santa Clara, USA
 
MMR.pdf
MMR.pdfMMR.pdf
MMR.pdf
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
Towards designing and evaluating future library information systems example o...
Towards designing and evaluating future library information systems example o...Towards designing and evaluating future library information systems example o...
Towards designing and evaluating future library information systems example o...
 
Data analytics to support awareness and recommendation
Data analytics to support awareness and recommendationData analytics to support awareness and recommendation
Data analytics to support awareness and recommendation
 
Semantic Need: Guiding Metadata Annotations by Questions People #ask
Semantic Need: Guiding Metadata Annotations by Questions People #askSemantic Need: Guiding Metadata Annotations by Questions People #ask
Semantic Need: Guiding Metadata Annotations by Questions People #ask
 

More from Lviv Data Science Summer School

Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Lviv Data Science Summer School
 
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Lviv Data Science Summer School
 
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Lviv Data Science Summer School
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Lviv Data Science Summer School
 
Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...
Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...
Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...
Lviv Data Science Summer School
 
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
 Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
Lviv Data Science Summer School
 
Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...
Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...
Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...
Lviv Data Science Summer School
 
Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...
Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...
Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...
Lviv Data Science Summer School
 
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Lviv Data Science Summer School
 
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
Lviv Data Science Summer School
 
Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...
Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...
Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...
Lviv Data Science Summer School
 
Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...
Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...
Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...
Lviv Data Science Summer School
 
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
Lviv Data Science Summer School
 
Master defence 2020 - Yevhen Pozdniakov - Changing Clothing on People Images...
Master defence 2020 - Yevhen Pozdniakov -  Changing Clothing on People Images...Master defence 2020 - Yevhen Pozdniakov -  Changing Clothing on People Images...
Master defence 2020 - Yevhen Pozdniakov - Changing Clothing on People Images...
Lviv Data Science Summer School
 
Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
 Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar... Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
Lviv Data Science Summer School
 
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Lviv Data Science Summer School
 
Master defence 2020 - Roman Riazantsev - 3D Reconstruction of Video Sign Lan...
Master defence 2020 -  Roman Riazantsev - 3D Reconstruction of Video Sign Lan...Master defence 2020 -  Roman Riazantsev - 3D Reconstruction of Video Sign Lan...
Master defence 2020 - Roman Riazantsev - 3D Reconstruction of Video Sign Lan...
Lviv Data Science Summer School
 
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Lviv Data Science Summer School
 
Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...
Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...
Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...
Lviv Data Science Summer School
 
Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...
Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...
Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...
Lviv Data Science Summer School
 

More from Lviv Data Science Summer School (20)

Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
 
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
 
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
Master defence 2020 - Nazariy Perepichka - Parameterizing of Human Speech Gen...
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...
Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...
Master defence 2020 - Serhii Tiutiunnyk - Context-based Question-answering Sy...
 
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
 Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
 
Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...
Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...
Master defence 2020 - Dmytro Babenko - Determining Sentiment and Important Pr...
 
Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...
Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...
Master defence 2020 - Oleh Lukianykhin - Reinforcement Learning for Voltage C...
 
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
 
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
 
Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...
Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...
Master defence 2020 - Anastasiia Kasprova - Customer Lifetime Value for Retai...
 
Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...
Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...
Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...
 
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
Master defence 2020 - Ivan Prodaiko - Person Re-identification in a Top-view ...
 
Master defence 2020 - Yevhen Pozdniakov - Changing Clothing on People Images...
Master defence 2020 - Yevhen Pozdniakov -  Changing Clothing on People Images...Master defence 2020 - Yevhen Pozdniakov -  Changing Clothing on People Images...
Master defence 2020 - Yevhen Pozdniakov - Changing Clothing on People Images...
 
Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
 Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar... Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
 
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
 
Master defence 2020 - Roman Riazantsev - 3D Reconstruction of Video Sign Lan...
Master defence 2020 -  Roman Riazantsev - 3D Reconstruction of Video Sign Lan...Master defence 2020 -  Roman Riazantsev - 3D Reconstruction of Video Sign Lan...
Master defence 2020 - Roman Riazantsev - 3D Reconstruction of Video Sign Lan...
 
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
 
Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...
Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...
Master defence 2020 -Roman Moiseiev - Stock Market Prediction Utilizing Centr...
 
Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...
Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...
Master defence 2020 - Oleksandr Smyrnov - A Multifactorial Optimization of Pe...
 

Recently uploaded

The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 

Recently uploaded (20)

The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 

Master defence 2020 - Maksym Opirskyi -Topological Approach to Wikipedia Article Recommendation

  • 1. Topological approach to Wikipedia article recommendation Author: Maksym Opirskyi opirskyi@ucu.edu.ua Supervisor: Petro Sarkanych sarkanyp@coventry.ac.uk 1. Institute for Condensed Matter Physics: Lviv, UA 2. L4 Collaboration & Doctoral College for the Statistical Physics of Complex Systems, Leipzig-Lorraine-Lviv-Coventry, Europe 3. Ukrainian Catholic University
  • 2. Plan ● Introduction and motivation ● Problem statement and research questions ● Related work ● Data ● Proposed method ● Evaluation and results ● Conclusion 2
  • 4. Wikipedia article recommendation Source: https://www.wikipedia.org/ 4 Source: https://www.flickr.com/photos/sjcockell/8425835703
  • 5. Motivation Source: https://en.wikipedia.org/wiki/Programming_language_theory#See_also ● Manual search is hard ● Millions of articles ● No “See also” on each article ● “See also” are created manually ● Enhancement of information search 5
  • 6. Problem formulation and research gap ● How to find and list top k articles related to the given one?(ranking problem) 6
  • 7. Problem formulation and research gap ● How to find and list top k articles related to the given one?(ranking problem) ● There is no such a tool currently 7
  • 8. Problem formulation and research gap ● How to find and list top k articles related to the given one?(ranking problem) ● There is no such a tool currently ● Tools that enhance the experience with Wikipedia use neither topological information of the Wiki graph nor history of human navigation on it 8
  • 9. Research Questions ● Does topology reflect the relatedness of articles? 9
  • 10. Research Questions ● Does topology reflect the relatedness of articles? ● Is this information sufficient? If not, do the navigation information enrich this information and contribute to the goodness of the recommendation? 10
  • 11. Research Questions ● Does topology reflect the relatedness of articles? ● Is this information sufficient? If not, do the navigation information enrich this information and contribute to the goodness of the recommendation? ● Which of these options - solely structural information or navigational information - is more important for good recommendation? 11
  • 12. Related work ● Navigation in information networks ● Patterns in human navigation ● Random walks ● Recommender systems 12
  • 13. Navigation in information networks ● Kleinberg: (some) networks are efficiently navigable! ● Semantic relatedness based on game paths (West et al.) 13
  • 14. Patterns in search strategies ● Move to hub, then zoom in pattern (West and Leskovec) ● Back clicking patterns (West and Leskovec) ● Article structure matters (Lamprecht et al.) 14
  • 16. Recommender systems ● Semantic relatedness of Wikipedia articles using personalized PageRank (Yeh et al.) ● Random walk based recommender system for movie recommendation (Bogers) 16
  • 17. Data ● Wikispeedia graph (condensed version of Wikipedia): ○ Articles - 4,604 ○ Links - 119,882 ● Collection of players’ game paths (on Wikispeedia graph): ○ Finished paths - 51,318 ○ Unfinished paths - 24,875 ● Plain text of Wikispeedia articles ~94 MiB ● Wikipedia dump (raw) ~5.7 GiB (for obtaining ground truth recommendations) Source: http://allthingsgraphed.com/2015/09/16/what-is-transhumanis m-wikipedia/ 17
  • 18. Wikispeedia graph ● Average shortest path is ~3.2 ● Degree distribution resembles power-law ● Average clustering coefficient is 0.11 (corresponding Erdos-Renyi graph: 0.0056) ● Global clustering coefficient is 0.1 (corresponding Erdos-Renyi network: 0.011) 18
  • 19. Proposed method ● Random walk agents ● Recommendation prediction 19
  • 20. Random walk agents ● Random walk agents ○ Options: ■ , ■ , ■ , where is number of player transitions from node i to node j ■ , where , is the number of outgoing links of node i 20
  • 21. Recommendation prediction ● Obtain ground truth recommendations for articles. ● Compute metrics from previous slide for each pair (article, recommendation). These are positive examples. ● For each article that has ground truth recommendations, randomly sample articles that are not recommendations for the current one and compute metrics for them. These are negative examples. ● Assign corresponding labels, train binary classifier. 21
  • 23. Agent results, visitance vs degree 23
  • 25. Evaluation ● Train/Test split and predict recommended entities of the test set ● Metrics: ○ MAP@k, ○ V@k, normalized total reciprocal rank, 25
  • 28. Conclusion ● Information encoded in human trails is useful in recommendation of similar articles ● Simple structural information like degree(uniform agent) produces second best results ● Long vs short walks -> what is better? ● Similarity is not changing with time on average and is less than similarity between article and its true recommendations ● None of the considered transition probabilities produces very good recommendation, however their linear combination works much better in the context of prediction 28
  • 29. Contribution ● Designed metrics (transition probabilities) for random walk agents ● Implemented recommender system based on the agents’ walks ● Discovered that walks are local and that similarity is stable ● Proposed method to combine metrics (linear combination) ● Designed the procedure for inferring metric importances 29
  • 31. Recommendation examples Path based: ○ United States ○ Europe ○ United Kingdom ○ England ○ Earth ○ World War II ○ France ○ Africa ○ Germany ○ North America Uniform: ○ United States ○ Europe ○ United Kingdom ○ France ○ Latin ○ India ○ World War II ○ Germany ○ English language ○ Spain Pearson: ○ Harry Potter ○ Das Kapital ○ Cambodia ○ Sistine Chapel ceiling ○ Las Vegas, Nevada ○ Palm oil ○ Sigmund Freud ○ Battle of Marathon ○ Hinduism ○ Afghanistan Tf-idf (alpha = 2): ○ Europe ○ England ○ Judaism ○ Italy ○ Turkey ○ Spain ○ Bermuda ○ Malta ○ Geography ○ Tungsten Query: Monarchy, top 10 most occuring recommendations 31