SlideShare a Scribd company logo
Re-ranking Web
Documents Based
on Personal
Preferences
Satarupa Guha, Priyanshu Agrawal, Shubham Sangal
IIIT Hyderabad
• Each Individual is unique.
• Search Engines should re-
rank its results based on
his profile so that his
specific requirements can
be met
Personalisation… Why ??
• Personalisation works using
Individualisation and
Contextualisation
• Individualisation means creating
each User’s Individual Profile
using his Long-Term History
• Contextualisation means
Identifying relation between
sessions using Short-Term
History
Dataset… Fully Anonymized
• Dataset was provided by Yandex ( Almost 16 GB of train
data and 400 MB of test Data)
• Search Data for 30 days from a large city was collected.
• To allay privacy concerns the user data is fully anonymized.
• Only meaningless numeric IDs of users, queries, query terms,
sessions, URLs and their domains are released.
• Training was done on data of first 27 days and testing on last
3 days.
Data Format
The log represents a stream of user actions, with each
line representing a session metadata, a query action, or a click
action. Each line contains tab-separated values according to
the following format:
Session metadata (TypeOfRecord = M):
SessionID TypeOfRecord Day USERID
Query action (TypeOfRecord = Q or T):
SessionID TimePassed TypeOfRecord SERPID QueryID
ListOfTerms ListOfURLsAndDomains
Click action (TypeOfRecord = C):
SessionID TimePassed TypeOfRecord SERPID URLID
Approach
Feature Engineering
Train Data Test Data
Basic Features User-Specific Features Pair-wise Features
Feature Representation
Regression Model
Trained Model
Re-ranked URLs
Train Data
Test
Data
Feature Engineering
• Initially each URL is labelled with a relevance label of:
o 0 (irrelevant) : no clicks or click less than 50 time units
o 1 (relevant): clicks with 50 to 399 time units
o 2 (highly relevant): dwell time grater than 400 units
• Basic Features used include URL-query instance such as
original position of URL for query , URLId , DomainId of
URLs , TermIds of query and various joint features such as
URLId X position , URLId X QueryId , DomainId X TermId
etc.
• User Specific Features include for each user , dividing each
URL in 5 categories based on relevance labels and whether
URL was previously displayed or skipped to the User.
• Pairwise Features include for each query checking the
relative position of a pair of URL
f(URLi, URLj) = 1 if URLi occurs at better position than
URLj
f(URLi, URLj) = -1 if URLj occurs at better position than
URLi
Feature Engineering
Logistic Regression
• Simple Logistic Regression Model was
used.
• URL with positive relevance was
labelled 1. Rest labelled -1.
• Relevance Labels assigned to each
URL w.r.t to each User were used as
Weight.
• Out of Core learning Algorithms
are used as they can work with huge
Data in hours.
Vowpal Wabbit for
Logistic Regression
• Fast Machine Learning tool.
• Works on huge data in a
parallel manner.
• RAM efficient.
• Fast Convergence to a Good
Predictor.
• Handles TBs of Data in Matter
of Hours.
Machine Learning
Evaluation and Results
• We used NDCG ( Normalized Discounted Cumulative Gain
)for evaluation of our results.
• Calculated using the ranking of URLs for each query, and
then averaged over queries.
𝐷𝐶𝐺@10 =
𝑖=1
10
2 𝑟𝑒𝑙 𝑖−1
𝑙𝑜𝑔2(𝑖+1)
where 𝑟𝑒𝑙𝑖, 𝑖 = 1,2, … 10 is the relevance list that contain 10
URL’s relevance values. IDCG@10 is the maximum possible
(ideal) DCG for a given set of queries, documents, and
relevance value. Then, NDCG@10 is given by
𝑁𝐷𝐶𝐺@10 =
𝐷𝐶𝐺@10
𝐼𝐷𝐶𝐺@10
• Uploaded the final Result file online on Kaggle Portal for
Evaluation.
• Our Team (Satarupa Guha) Got NDCG Score 0.76857
• Team who secured first Position got 0.80725
Thank You !!!

More Related Content

Viewers also liked

First Grade Jeopardy Review
First Grade Jeopardy Review First Grade Jeopardy Review
First Grade Jeopardy Review
chatchis21
 
Huong dan-su-dung-phan-mem-ban-hang
Huong dan-su-dung-phan-mem-ban-hangHuong dan-su-dung-phan-mem-ban-hang
Huong dan-su-dung-phan-mem-ban-hang
Nguyễn Tuân
 
Procrastination in organization
Procrastination in organizationProcrastination in organization
Procrastination in organization
farahm3d
 
Scope management term paper
Scope management term paperScope management term paper
Scope management term paper
Muhammad Umar
 
Centrepoint presentation
Centrepoint presentationCentrepoint presentation
Centrepoint presentation
Muhammad Umar
 
Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...
Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...
Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...
Muhammad Umar
 
E business case study danish furniture
E  business case study danish furnitureE  business case study danish furniture
E business case study danish furniture
farahm3d
 
Mass transit system
Mass transit systemMass transit system
Mass transit system
Muhammad Umar
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 

Viewers also liked (9)

First Grade Jeopardy Review
First Grade Jeopardy Review First Grade Jeopardy Review
First Grade Jeopardy Review
 
Huong dan-su-dung-phan-mem-ban-hang
Huong dan-su-dung-phan-mem-ban-hangHuong dan-su-dung-phan-mem-ban-hang
Huong dan-su-dung-phan-mem-ban-hang
 
Procrastination in organization
Procrastination in organizationProcrastination in organization
Procrastination in organization
 
Scope management term paper
Scope management term paperScope management term paper
Scope management term paper
 
Centrepoint presentation
Centrepoint presentationCentrepoint presentation
Centrepoint presentation
 
Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...
Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...
Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...
 
E business case study danish furniture
E  business case study danish furnitureE  business case study danish furniture
E business case study danish furniture
 
Mass transit system
Mass transit systemMass transit system
Mass transit system
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 

Similar to Re-ranking Web Documents Using Personal Preferences

Click Log Mining CS598
Click Log Mining CS598Click Log Mining CS598
Click Log Mining CS598
Shih-Wen Huang
 
10 personalized-web-search-techniques
10 personalized-web-search-techniques10 personalized-web-search-techniques
10 personalized-web-search-techniques
dipanjalishipne
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
Simon Hughes
 
Personalized Re-Ranking of Documents
Personalized Re-Ranking of DocumentsPersonalized Re-Ranking of Documents
Personalized Re-Ranking of Documents
kswapna9
 
SharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 SearchSharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 Search
C/D/H Technology Consultants
 
How To Build your own Custom Search Engine
How To Build your own Custom Search EngineHow To Build your own Custom Search Engine
How To Build your own Custom Search Engine
Richa Budhraja
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Lucidworks
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
dhabalia
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
GokulD
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for Business
Ivo Andreev
 
Developing Search-driven application in SharePoint 2013
 Developing Search-driven application in SharePoint 2013  Developing Search-driven application in SharePoint 2013
Developing Search-driven application in SharePoint 2013
SPC Adriatics
 
SharePoint 2013 Search Based Solutions
SharePoint 2013 Search Based SolutionsSharePoint 2013 Search Based Solutions
SharePoint 2013 Search Based Solutions
SPC Adriatics
 
How seo works
How seo worksHow seo works
How seo works
Olatz Beitia
 
Optimising Queries - Series 1 Query Optimiser Architecture
Optimising Queries - Series 1 Query Optimiser ArchitectureOptimising Queries - Series 1 Query Optimiser Architecture
Optimising Queries - Series 1 Query Optimiser Architecture
DAGEOP LTD
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
Artificial Intelligence for Data Quality
Artificial Intelligence for Data QualityArtificial Intelligence for Data Quality
Artificial Intelligence for Data Quality
Vera Ekimenko
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Codemotion
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Lippo Group Digital
 
SlideSeekerUVCE
SlideSeekerUVCESlideSeekerUVCE
SlideSeekerUVCE
rohitnbwlf
 

Similar to Re-ranking Web Documents Using Personal Preferences (20)

Click Log Mining CS598
Click Log Mining CS598Click Log Mining CS598
Click Log Mining CS598
 
10 personalized-web-search-techniques
10 personalized-web-search-techniques10 personalized-web-search-techniques
10 personalized-web-search-techniques
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Personalized Re-Ranking of Documents
Personalized Re-Ranking of DocumentsPersonalized Re-Ranking of Documents
Personalized Re-Ranking of Documents
 
SharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 SearchSharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 Search
 
How To Build your own Custom Search Engine
How To Build your own Custom Search EngineHow To Build your own Custom Search Engine
How To Build your own Custom Search Engine
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for Business
 
Developing Search-driven application in SharePoint 2013
 Developing Search-driven application in SharePoint 2013  Developing Search-driven application in SharePoint 2013
Developing Search-driven application in SharePoint 2013
 
SharePoint 2013 Search Based Solutions
SharePoint 2013 Search Based SolutionsSharePoint 2013 Search Based Solutions
SharePoint 2013 Search Based Solutions
 
How seo works
How seo worksHow seo works
How seo works
 
Optimising Queries - Series 1 Query Optimiser Architecture
Optimising Queries - Series 1 Query Optimiser ArchitectureOptimising Queries - Series 1 Query Optimiser Architecture
Optimising Queries - Series 1 Query Optimiser Architecture
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Artificial Intelligence for Data Quality
Artificial Intelligence for Data QualityArtificial Intelligence for Data Quality
Artificial Intelligence for Data Quality
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
 
SlideSeekerUVCE
SlideSeekerUVCESlideSeekerUVCE
SlideSeekerUVCE
 

Recently uploaded

Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
Paul Walk
 
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
uehowe
 
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
cuobya
 
Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!
Toptal Tech
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
fovkoyb
 
HijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process HollowingHijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process Hollowing
Donato Onofri
 
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
xjq03c34
 
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
bseovas
 
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
ukwwuq
 
Azure EA Sponsorship - Customer Guide.pdf
Azure EA Sponsorship - Customer Guide.pdfAzure EA Sponsorship - Customer Guide.pdf
Azure EA Sponsorship - Customer Guide.pdf
AanSulistiyo
 
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
vmemo1
 
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
k4ncd0z
 
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
zoowe
 
Search Result Showing My Post is Now Buried
Search Result Showing My Post is Now BuriedSearch Result Showing My Post is Now Buried
Search Result Showing My Post is Now Buried
Trish Parr
 
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
ysasp1
 
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
uehowe
 
Gen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needsGen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needs
Laura Szabó
 
Design Thinking NETFLIX using all techniques.pptx
Design Thinking NETFLIX using all techniques.pptxDesign Thinking NETFLIX using all techniques.pptx
Design Thinking NETFLIX using all techniques.pptx
saathvikreddy2003
 
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
3a0sd7z3
 
办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理
办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理
办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理
uehowe
 

Recently uploaded (20)

Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
 
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
 
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
 
Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
 
HijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process HollowingHijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process Hollowing
 
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
 
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
 
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
 
Azure EA Sponsorship - Customer Guide.pdf
Azure EA Sponsorship - Customer Guide.pdfAzure EA Sponsorship - Customer Guide.pdf
Azure EA Sponsorship - Customer Guide.pdf
 
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
 
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
 
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
 
Search Result Showing My Post is Now Buried
Search Result Showing My Post is Now BuriedSearch Result Showing My Post is Now Buried
Search Result Showing My Post is Now Buried
 
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
 
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
 
Gen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needsGen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needs
 
Design Thinking NETFLIX using all techniques.pptx
Design Thinking NETFLIX using all techniques.pptxDesign Thinking NETFLIX using all techniques.pptx
Design Thinking NETFLIX using all techniques.pptx
 
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
 
办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理
办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理
办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理
 

Re-ranking Web Documents Using Personal Preferences

  • 1. Re-ranking Web Documents Based on Personal Preferences Satarupa Guha, Priyanshu Agrawal, Shubham Sangal IIIT Hyderabad
  • 2. • Each Individual is unique. • Search Engines should re- rank its results based on his profile so that his specific requirements can be met Personalisation… Why ??
  • 3. • Personalisation works using Individualisation and Contextualisation • Individualisation means creating each User’s Individual Profile using his Long-Term History • Contextualisation means Identifying relation between sessions using Short-Term History
  • 4. Dataset… Fully Anonymized • Dataset was provided by Yandex ( Almost 16 GB of train data and 400 MB of test Data) • Search Data for 30 days from a large city was collected. • To allay privacy concerns the user data is fully anonymized. • Only meaningless numeric IDs of users, queries, query terms, sessions, URLs and their domains are released. • Training was done on data of first 27 days and testing on last 3 days.
  • 5. Data Format The log represents a stream of user actions, with each line representing a session metadata, a query action, or a click action. Each line contains tab-separated values according to the following format: Session metadata (TypeOfRecord = M): SessionID TypeOfRecord Day USERID Query action (TypeOfRecord = Q or T): SessionID TimePassed TypeOfRecord SERPID QueryID ListOfTerms ListOfURLsAndDomains Click action (TypeOfRecord = C): SessionID TimePassed TypeOfRecord SERPID URLID
  • 6. Approach Feature Engineering Train Data Test Data Basic Features User-Specific Features Pair-wise Features Feature Representation Regression Model Trained Model Re-ranked URLs Train Data Test Data
  • 7. Feature Engineering • Initially each URL is labelled with a relevance label of: o 0 (irrelevant) : no clicks or click less than 50 time units o 1 (relevant): clicks with 50 to 399 time units o 2 (highly relevant): dwell time grater than 400 units • Basic Features used include URL-query instance such as original position of URL for query , URLId , DomainId of URLs , TermIds of query and various joint features such as URLId X position , URLId X QueryId , DomainId X TermId etc.
  • 8. • User Specific Features include for each user , dividing each URL in 5 categories based on relevance labels and whether URL was previously displayed or skipped to the User. • Pairwise Features include for each query checking the relative position of a pair of URL f(URLi, URLj) = 1 if URLi occurs at better position than URLj f(URLi, URLj) = -1 if URLj occurs at better position than URLi Feature Engineering
  • 9. Logistic Regression • Simple Logistic Regression Model was used. • URL with positive relevance was labelled 1. Rest labelled -1. • Relevance Labels assigned to each URL w.r.t to each User were used as Weight. • Out of Core learning Algorithms are used as they can work with huge Data in hours.
  • 10. Vowpal Wabbit for Logistic Regression • Fast Machine Learning tool. • Works on huge data in a parallel manner. • RAM efficient. • Fast Convergence to a Good Predictor. • Handles TBs of Data in Matter of Hours. Machine Learning
  • 11. Evaluation and Results • We used NDCG ( Normalized Discounted Cumulative Gain )for evaluation of our results. • Calculated using the ranking of URLs for each query, and then averaged over queries. 𝐷𝐶𝐺@10 = 𝑖=1 10 2 𝑟𝑒𝑙 𝑖−1 𝑙𝑜𝑔2(𝑖+1) where 𝑟𝑒𝑙𝑖, 𝑖 = 1,2, … 10 is the relevance list that contain 10 URL’s relevance values. IDCG@10 is the maximum possible (ideal) DCG for a given set of queries, documents, and relevance value. Then, NDCG@10 is given by 𝑁𝐷𝐶𝐺@10 = 𝐷𝐶𝐺@10 𝐼𝐷𝐶𝐺@10
  • 12. • Uploaded the final Result file online on Kaggle Portal for Evaluation. • Our Team (Satarupa Guha) Got NDCG Score 0.76857 • Team who secured first Position got 0.80725