SlideShare a Scribd company logo
WebMining Projectwork
How to suggest the query you’d like to
input after
The WebLog
AnonID Query QueryTime ItemRank ClickURL
142rentdirect.com 01/03/2006 07:17
142www.prescriptionfortime.com 12/03/2006 12:31
142staple.com 17/03/2006 21:19
142staple.com 17/03/2006 21:19
142www.newyorklawyersite.com 18/03/2006 08:02
142www.newyorklawyersite.com 18/03/2006 08:03
142westchester.gov 20/03/2006 03:55 1
http://www.westchesterg
ov.com
142space.comhttp 24/03/2006 20:51
The WebLog is AOL weblog made available to public in 2006
The goal
Building a query suggestion application
exploting the information observed on the AOL
WebLog.
Constrains:
1) the application relies on observed queries
2) The application needs to be fast!
The approach
Exploiting the relation between typed queries
and clicked URL by AOL users:
If two queries share “a lot or URLs”
then they are strongly related to
each other
“a lot of URLs”….
Several approaches can be followed for linking
observed queries to clicked URLs
We’ve been inspired by “Query-URL Bipartite
Based Approach to Personalized Query
Recommendation” paper by Li, Yang, Liu,
Kitsuregawa, Proceedings of the Twenty-Third
AAAI Conference on Artificial Intelligence (2008)
Idea 1/2
Let q(i) be the i-th query and u(k) be the k-th
clicked url after a query is typed
A Bipartite Graph
can be built such
that for each q(i)
belonging to the
query set, a link to a
subsequent clicked
url u(k) can be
defined
Idea 2/2
Once a Bipartite Graph has been built, a relation
between any query belonging to the query set
can be established accordingly to the clicked
URLs.
An Affinity Graph over the
query set can be defined
consequently, where the
edges between two
queries have to be
weighted in order to
exploit it in a suggestion
task
Weighting the Edges
𝒘 𝒊, 𝒋 = 𝒌=𝟏
𝑼
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔 𝑼𝑹𝑳(𝒌) 𝒊𝒔 𝒄𝒍𝒊𝒄𝒌𝒆𝒅 𝒃𝒚 𝒒 𝒊 𝒂𝒏𝒅 𝒒(𝒋)
𝒌=𝟏
𝑼
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔 𝒂𝒏𝒚 𝑼𝑹𝑳(𝒌) 𝒊𝒔 𝒄𝒍𝒊𝒌𝒆𝒅 𝒃𝒚 𝒒 𝒊 + 𝒌=𝟏
𝑼
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔 𝒂𝒏𝒚 𝑼𝑹𝑳(𝒌) 𝒊𝒔 𝒄𝒍𝒊𝒌𝒆𝒅 𝒃𝒚 𝒒 𝒋
Let q(i) be the i-th query and u(k) be the k-th clicked url
after a query is typed
w(i,j) is equal to 1 if once q(i) or q(j) are passed the same URLs are clicked
w(i,j) is equal to 0 if once q(i) or q(j) are passed, all the clicked URLs don’t
match
Managing “over-clicked URLs”
On the AOL 2006 WebLog dataset there exist a number
of URLs which are over-clicked by users, independently
of the query they type before clicking them.
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
-foot-and-mouth-…
http://books.stores.ebay.ie
http://dixonmayfair.com
http://grounds-mag.com
http://local.infospace.com
http://p072.ezboard.com
http://shop.treonauts.com
http://vipcams.literotica.com
http://www.acbarandgrill.com
http://www.alyandaj.com
http://www.assplundering.com
http://www.beardieagilitydie…
http://www.bodo.com
http://www.calnhs.org
http://www.chantcd.com
http://www.clubunlimited.com
http://www.creativeforecasti…
http://www.dennys.com
http://www.duplicolor.com
http://www.esilvercart.com
http://www.fitzandfloyd.com
http://www.gamecubecheats…
http://www.grandmashandsb…
http://www.henrymedical.com
http://www.i-m-t.demon.co.uk
http://www.jacksonsoccer.com
http://www.keyloggers.com
http://www.leesburg2day.com
http://www.madison.k12.ky.us
http://www.mercy.net
http://www.mp3sugar.com
http://www.netads.com
http://www.oceanviewinnan…
http://www.partsforlifts.com
http://www.poetsgraves.co.uk
http://www.radio-3.ru
http://www.robotstorehk.com
http://www.scotfest.com
http://www.skinashoba.com
http://www.starktaxes.com
http://www.talktorusty.com
http://www.theremyreport.c…
http://www.trollcarnival.com
http://www.vcta.com
http://www.welovedolls.com
http://www.xandocosi.com
URLs Click Count
Managing “over-clicked URLs”
Those URLS generate a noise in the query recommendation
algorithm. For this reason we selected only those URLs having
less than 1,000 clicks
0
100
200
300
400
500
600
700
800
900
1000
-foot-and-mouth-…
http://blackdicksmovies.deluxep…
http://dallasnative.com
http://freescreensaver.ezthemes…
http://jingdong.en.alibaba.com
http://mtv-spring-…
http://pub25.bravenet.com
http://store.vegas.com
http://westsideconnection.org
http://www.acsu.buffalo.edu
http://www.amarula.com
http://www.asht.org
http://www.bathandmore.com
http://www.blackmanlaw.com
http://www.buerge.com
http://www.caswells.com
http://www.chsb.org
http://www.colts.com
http://www.ctahperd.org
http://www.dewattoport.com
http://www.dvdworldonline.com
http://www.ericdaugherty.com
http://www.findlayfpc.org
http://www.frugalhaus.com
http://www.gniarmls.com
http://www.hankingroup.com
http://www.homerwood.com
http://www.incomemax.com
http://www.jesusandkidz.com
http://www.kinray.com
http://www.lemassif.com
http://www.machinetools.net.tw
http://www.medrekforum.com
http://www.montgomerycollege.…
http://www.natalbelo.com
http://www.northlouisianaskydiv…
http://www.orientvisual.com
http://www.performancedogsina…
http://www.pptbackgrounds.fsn…
http://www.ravc.com
http://www.rodssteak-…
http://www.scms.ca
http://www.simplysiestakey.com
http://www.sportsstats.com
http://www.supersprings.com
http://www.thebeverlyhillscouri…
http://www.tombraidermovie.com
http://www.ulqini.de
http://www.virtualict.com
http://www.whipnspur.com
http://www.yardleylondon.com
URLs Click Count
Affinity Graph Representation
Once the edge weight is computed, for each query
q(i) we built a main dictionay having key = q(i) and
value equal to an ordered dictionary.
The ordered dictionary has keys equals to the
queries sharing at least 1 URL with q(i) and values
equal to w(i,j).
The main dictionary is used to feed the query
suggestion API and provide a reliable result in
milliseconds.
Demo for those who can’t enjoy it the
LIVE one 
Thanks!
Andrea Gigli
https://about.me/andrea.gigli

More Related Content

Similar to Search Engine Query Suggestion Application

How Tracking Companies Circumvented Ad Blockers Using WebSockets
How Tracking Companies Circumvented Ad Blockers Using WebSocketsHow Tracking Companies Circumvented Ad Blockers Using WebSockets
How Tracking Companies Circumvented Ad Blockers Using WebSockets
Sajjad "JJ" Arshad
 
How Tracking Companies Circumvent Ad Blockers Using WebSockets
How Tracking Companies Circumvent Ad Blockers Using WebSocketsHow Tracking Companies Circumvent Ad Blockers Using WebSockets
How Tracking Companies Circumvent Ad Blockers Using WebSockets
Sajjad "JJ" Arshad
 
Антон Бойко "Azure Web Apps deep dive"
Антон Бойко "Azure Web Apps deep dive"Антон Бойко "Azure Web Apps deep dive"
Антон Бойко "Azure Web Apps deep dive"
Fwdays
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Michael Nelson
 
OWASP Free Training - SF2014 - Keary and Manico
OWASP Free Training - SF2014 - Keary and ManicoOWASP Free Training - SF2014 - Keary and Manico
OWASP Free Training - SF2014 - Keary and Manico
Eoin Keary
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET Technology
IOSR Journals
 
HTML5.pptx
HTML5.pptxHTML5.pptx
HTML5.pptx
pramod jali
 
Crunching the Top 10000 Websites' Password Policies and Controls [Presented b...
Crunching the Top 10000 Websites' Password Policies and Controls [Presented b...Crunching the Top 10000 Websites' Password Policies and Controls [Presented b...
Crunching the Top 10000 Websites' Password Policies and Controls [Presented b...
Steve Werby
 
IT Skills Analysis
IT Skills AnalysisIT Skills Analysis
IT Skills Analysis
Habet Madoyan
 
Info2006 Web20 Taly Print
Info2006 Web20 Taly PrintInfo2006 Web20 Taly Print
Info2006 Web20 Taly Print
Ram Srivastava
 
Cindy Krum Krum Cindy "What SEOs Need To Know About Progressive Web Apps" SMX...
Cindy Krum Krum Cindy "What SEOs Need To Know About Progressive Web Apps" SMX...Cindy Krum Krum Cindy "What SEOs Need To Know About Progressive Web Apps" SMX...
Cindy Krum Krum Cindy "What SEOs Need To Know About Progressive Web Apps" SMX...
MobileMoxie
 
Lec6 ecom fall16
Lec6 ecom fall16Lec6 ecom fall16
Lec6 ecom fall16
Zainab Khallouf
 
Software Analysis for the Web: Achievements and Prospects
Software Analysis for the Web: Achievements and ProspectsSoftware Analysis for the Web: Achievements and Prospects
Software Analysis for the Web: Achievements and Prospects
Ali Mesbah
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
IRJET Journal
 
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and ScaleGDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
Patrick Chanezon
 
Web Development Training in Ambala ! Batra Computer Centre
Web Development Training in Ambala ! Batra Computer CentreWeb Development Training in Ambala ! Batra Computer Centre
Web Development Training in Ambala ! Batra Computer Centre
jatin batra
 
Door Of Internet
Door Of InternetDoor Of Internet
Door Of Internet
Kuldeep Padhiyar
 
Amp your site: An intro to accelerated mobile pages
Amp your site: An intro to accelerated mobile pagesAmp your site: An intro to accelerated mobile pages
Amp your site: An intro to accelerated mobile pages
Robert McFrazier
 
Real-time Collaborative Editing with CRDTs
Real-time Collaborative Editing with CRDTsReal-time Collaborative Editing with CRDTs
Real-time Collaborative Editing with CRDTs
C4Media
 
UCLA HACKU'11
UCLA HACKU'11UCLA HACKU'11
UCLA HACKU'11
Gopal Venkatesan
 

Similar to Search Engine Query Suggestion Application (20)

How Tracking Companies Circumvented Ad Blockers Using WebSockets
How Tracking Companies Circumvented Ad Blockers Using WebSocketsHow Tracking Companies Circumvented Ad Blockers Using WebSockets
How Tracking Companies Circumvented Ad Blockers Using WebSockets
 
How Tracking Companies Circumvent Ad Blockers Using WebSockets
How Tracking Companies Circumvent Ad Blockers Using WebSocketsHow Tracking Companies Circumvent Ad Blockers Using WebSockets
How Tracking Companies Circumvent Ad Blockers Using WebSockets
 
Антон Бойко "Azure Web Apps deep dive"
Антон Бойко "Azure Web Apps deep dive"Антон Бойко "Azure Web Apps deep dive"
Антон Бойко "Azure Web Apps deep dive"
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
OWASP Free Training - SF2014 - Keary and Manico
OWASP Free Training - SF2014 - Keary and ManicoOWASP Free Training - SF2014 - Keary and Manico
OWASP Free Training - SF2014 - Keary and Manico
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET Technology
 
HTML5.pptx
HTML5.pptxHTML5.pptx
HTML5.pptx
 
Crunching the Top 10000 Websites' Password Policies and Controls [Presented b...
Crunching the Top 10000 Websites' Password Policies and Controls [Presented b...Crunching the Top 10000 Websites' Password Policies and Controls [Presented b...
Crunching the Top 10000 Websites' Password Policies and Controls [Presented b...
 
IT Skills Analysis
IT Skills AnalysisIT Skills Analysis
IT Skills Analysis
 
Info2006 Web20 Taly Print
Info2006 Web20 Taly PrintInfo2006 Web20 Taly Print
Info2006 Web20 Taly Print
 
Cindy Krum Krum Cindy "What SEOs Need To Know About Progressive Web Apps" SMX...
Cindy Krum Krum Cindy "What SEOs Need To Know About Progressive Web Apps" SMX...Cindy Krum Krum Cindy "What SEOs Need To Know About Progressive Web Apps" SMX...
Cindy Krum Krum Cindy "What SEOs Need To Know About Progressive Web Apps" SMX...
 
Lec6 ecom fall16
Lec6 ecom fall16Lec6 ecom fall16
Lec6 ecom fall16
 
Software Analysis for the Web: Achievements and Prospects
Software Analysis for the Web: Achievements and ProspectsSoftware Analysis for the Web: Achievements and Prospects
Software Analysis for the Web: Achievements and Prospects
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
 
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and ScaleGDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
 
Web Development Training in Ambala ! Batra Computer Centre
Web Development Training in Ambala ! Batra Computer CentreWeb Development Training in Ambala ! Batra Computer Centre
Web Development Training in Ambala ! Batra Computer Centre
 
Door Of Internet
Door Of InternetDoor Of Internet
Door Of Internet
 
Amp your site: An intro to accelerated mobile pages
Amp your site: An intro to accelerated mobile pagesAmp your site: An intro to accelerated mobile pages
Amp your site: An intro to accelerated mobile pages
 
Real-time Collaborative Editing with CRDTs
Real-time Collaborative Editing with CRDTsReal-time Collaborative Editing with CRDTs
Real-time Collaborative Editing with CRDTs
 
UCLA HACKU'11
UCLA HACKU'11UCLA HACKU'11
UCLA HACKU'11
 

More from Andrea Gigli

How organizations can become data-driven: three main rules
How organizations can become data-driven: three main rulesHow organizations can become data-driven: three main rules
How organizations can become data-driven: three main rules
Andrea Gigli
 
Equity Value for Startups.pdf
Equity Value for Startups.pdfEquity Value for Startups.pdf
Equity Value for Startups.pdf
Andrea Gigli
 
Introduction to recommender systems
Introduction to recommender systemsIntroduction to recommender systems
Introduction to recommender systems
Andrea Gigli
 
Data Analytics per Manager
Data Analytics per ManagerData Analytics per Manager
Data Analytics per Manager
Andrea Gigli
 
Balance-sheet dynamics impact on FVA, MVA, KVA
Balance-sheet dynamics impact on FVA, MVA, KVABalance-sheet dynamics impact on FVA, MVA, KVA
Balance-sheet dynamics impact on FVA, MVA, KVA
Andrea Gigli
 
Reasons behind XVAs
Reasons behind XVAs Reasons behind XVAs
Reasons behind XVAs
Andrea Gigli
 
Recommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial ServicesRecommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial Services
Andrea Gigli
 
Mine the Wine by Andrea Gigli
Mine the Wine by Andrea GigliMine the Wine by Andrea Gigli
Mine the Wine by Andrea Gigli
Andrea Gigli
 
Fast Feature Selection for Learning to Rank - ACM International Conference on...
Fast Feature Selection for Learning to Rank - ACM International Conference on...Fast Feature Selection for Learning to Rank - ACM International Conference on...
Fast Feature Selection for Learning to Rank - ACM International Conference on...
Andrea Gigli
 
Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document Ranking
Andrea Gigli
 
Using R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective DashboardUsing R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective Dashboard
Andrea Gigli
 
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Andrea Gigli
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text Mining
Andrea Gigli
 
Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015
Andrea Gigli
 
Electricity Derivatives
Electricity DerivativesElectricity Derivatives
Electricity Derivatives
Andrea Gigli
 
Crawling Tripadvisor Attracion Reviews - Italiano
Crawling Tripadvisor Attracion Reviews - ItalianoCrawling Tripadvisor Attracion Reviews - Italiano
Crawling Tripadvisor Attracion Reviews - Italiano
Andrea Gigli
 
Search Engine for World Recipes Expo 2015
Search Engine for World Recipes Expo 2015Search Engine for World Recipes Expo 2015
Search Engine for World Recipes Expo 2015
Andrea Gigli
 
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQLA Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
Andrea Gigli
 
From real to risk neutral probability measure for pricing and managing cva
From real to risk neutral probability measure for pricing and managing cvaFrom real to risk neutral probability measure for pricing and managing cva
From real to risk neutral probability measure for pricing and managing cva
Andrea Gigli
 
Startup Saturday Internet Festival 2014
Startup Saturday Internet Festival 2014Startup Saturday Internet Festival 2014
Startup Saturday Internet Festival 2014
Andrea Gigli
 

More from Andrea Gigli (20)

How organizations can become data-driven: three main rules
How organizations can become data-driven: three main rulesHow organizations can become data-driven: three main rules
How organizations can become data-driven: three main rules
 
Equity Value for Startups.pdf
Equity Value for Startups.pdfEquity Value for Startups.pdf
Equity Value for Startups.pdf
 
Introduction to recommender systems
Introduction to recommender systemsIntroduction to recommender systems
Introduction to recommender systems
 
Data Analytics per Manager
Data Analytics per ManagerData Analytics per Manager
Data Analytics per Manager
 
Balance-sheet dynamics impact on FVA, MVA, KVA
Balance-sheet dynamics impact on FVA, MVA, KVABalance-sheet dynamics impact on FVA, MVA, KVA
Balance-sheet dynamics impact on FVA, MVA, KVA
 
Reasons behind XVAs
Reasons behind XVAs Reasons behind XVAs
Reasons behind XVAs
 
Recommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial ServicesRecommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial Services
 
Mine the Wine by Andrea Gigli
Mine the Wine by Andrea GigliMine the Wine by Andrea Gigli
Mine the Wine by Andrea Gigli
 
Fast Feature Selection for Learning to Rank - ACM International Conference on...
Fast Feature Selection for Learning to Rank - ACM International Conference on...Fast Feature Selection for Learning to Rank - ACM International Conference on...
Fast Feature Selection for Learning to Rank - ACM International Conference on...
 
Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document Ranking
 
Using R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective DashboardUsing R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective Dashboard
 
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text Mining
 
Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015
 
Electricity Derivatives
Electricity DerivativesElectricity Derivatives
Electricity Derivatives
 
Crawling Tripadvisor Attracion Reviews - Italiano
Crawling Tripadvisor Attracion Reviews - ItalianoCrawling Tripadvisor Attracion Reviews - Italiano
Crawling Tripadvisor Attracion Reviews - Italiano
 
Search Engine for World Recipes Expo 2015
Search Engine for World Recipes Expo 2015Search Engine for World Recipes Expo 2015
Search Engine for World Recipes Expo 2015
 
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQLA Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
 
From real to risk neutral probability measure for pricing and managing cva
From real to risk neutral probability measure for pricing and managing cvaFrom real to risk neutral probability measure for pricing and managing cva
From real to risk neutral probability measure for pricing and managing cva
 
Startup Saturday Internet Festival 2014
Startup Saturday Internet Festival 2014Startup Saturday Internet Festival 2014
Startup Saturday Internet Festival 2014
 

Recently uploaded

The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 

Recently uploaded (20)

The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 

Search Engine Query Suggestion Application

  • 1. WebMining Projectwork How to suggest the query you’d like to input after
  • 2. The WebLog AnonID Query QueryTime ItemRank ClickURL 142rentdirect.com 01/03/2006 07:17 142www.prescriptionfortime.com 12/03/2006 12:31 142staple.com 17/03/2006 21:19 142staple.com 17/03/2006 21:19 142www.newyorklawyersite.com 18/03/2006 08:02 142www.newyorklawyersite.com 18/03/2006 08:03 142westchester.gov 20/03/2006 03:55 1 http://www.westchesterg ov.com 142space.comhttp 24/03/2006 20:51 The WebLog is AOL weblog made available to public in 2006
  • 3. The goal Building a query suggestion application exploting the information observed on the AOL WebLog. Constrains: 1) the application relies on observed queries 2) The application needs to be fast!
  • 4. The approach Exploiting the relation between typed queries and clicked URL by AOL users: If two queries share “a lot or URLs” then they are strongly related to each other
  • 5. “a lot of URLs”…. Several approaches can be followed for linking observed queries to clicked URLs We’ve been inspired by “Query-URL Bipartite Based Approach to Personalized Query Recommendation” paper by Li, Yang, Liu, Kitsuregawa, Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008)
  • 6. Idea 1/2 Let q(i) be the i-th query and u(k) be the k-th clicked url after a query is typed A Bipartite Graph can be built such that for each q(i) belonging to the query set, a link to a subsequent clicked url u(k) can be defined
  • 7. Idea 2/2 Once a Bipartite Graph has been built, a relation between any query belonging to the query set can be established accordingly to the clicked URLs. An Affinity Graph over the query set can be defined consequently, where the edges between two queries have to be weighted in order to exploit it in a suggestion task
  • 8. Weighting the Edges 𝒘 𝒊, 𝒋 = 𝒌=𝟏 𝑼 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔 𝑼𝑹𝑳(𝒌) 𝒊𝒔 𝒄𝒍𝒊𝒄𝒌𝒆𝒅 𝒃𝒚 𝒒 𝒊 𝒂𝒏𝒅 𝒒(𝒋) 𝒌=𝟏 𝑼 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔 𝒂𝒏𝒚 𝑼𝑹𝑳(𝒌) 𝒊𝒔 𝒄𝒍𝒊𝒌𝒆𝒅 𝒃𝒚 𝒒 𝒊 + 𝒌=𝟏 𝑼 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔 𝒂𝒏𝒚 𝑼𝑹𝑳(𝒌) 𝒊𝒔 𝒄𝒍𝒊𝒌𝒆𝒅 𝒃𝒚 𝒒 𝒋 Let q(i) be the i-th query and u(k) be the k-th clicked url after a query is typed w(i,j) is equal to 1 if once q(i) or q(j) are passed the same URLs are clicked w(i,j) is equal to 0 if once q(i) or q(j) are passed, all the clicked URLs don’t match
  • 9. Managing “over-clicked URLs” On the AOL 2006 WebLog dataset there exist a number of URLs which are over-clicked by users, independently of the query they type before clicking them. 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 -foot-and-mouth-… http://books.stores.ebay.ie http://dixonmayfair.com http://grounds-mag.com http://local.infospace.com http://p072.ezboard.com http://shop.treonauts.com http://vipcams.literotica.com http://www.acbarandgrill.com http://www.alyandaj.com http://www.assplundering.com http://www.beardieagilitydie… http://www.bodo.com http://www.calnhs.org http://www.chantcd.com http://www.clubunlimited.com http://www.creativeforecasti… http://www.dennys.com http://www.duplicolor.com http://www.esilvercart.com http://www.fitzandfloyd.com http://www.gamecubecheats… http://www.grandmashandsb… http://www.henrymedical.com http://www.i-m-t.demon.co.uk http://www.jacksonsoccer.com http://www.keyloggers.com http://www.leesburg2day.com http://www.madison.k12.ky.us http://www.mercy.net http://www.mp3sugar.com http://www.netads.com http://www.oceanviewinnan… http://www.partsforlifts.com http://www.poetsgraves.co.uk http://www.radio-3.ru http://www.robotstorehk.com http://www.scotfest.com http://www.skinashoba.com http://www.starktaxes.com http://www.talktorusty.com http://www.theremyreport.c… http://www.trollcarnival.com http://www.vcta.com http://www.welovedolls.com http://www.xandocosi.com URLs Click Count
  • 10. Managing “over-clicked URLs” Those URLS generate a noise in the query recommendation algorithm. For this reason we selected only those URLs having less than 1,000 clicks 0 100 200 300 400 500 600 700 800 900 1000 -foot-and-mouth-… http://blackdicksmovies.deluxep… http://dallasnative.com http://freescreensaver.ezthemes… http://jingdong.en.alibaba.com http://mtv-spring-… http://pub25.bravenet.com http://store.vegas.com http://westsideconnection.org http://www.acsu.buffalo.edu http://www.amarula.com http://www.asht.org http://www.bathandmore.com http://www.blackmanlaw.com http://www.buerge.com http://www.caswells.com http://www.chsb.org http://www.colts.com http://www.ctahperd.org http://www.dewattoport.com http://www.dvdworldonline.com http://www.ericdaugherty.com http://www.findlayfpc.org http://www.frugalhaus.com http://www.gniarmls.com http://www.hankingroup.com http://www.homerwood.com http://www.incomemax.com http://www.jesusandkidz.com http://www.kinray.com http://www.lemassif.com http://www.machinetools.net.tw http://www.medrekforum.com http://www.montgomerycollege.… http://www.natalbelo.com http://www.northlouisianaskydiv… http://www.orientvisual.com http://www.performancedogsina… http://www.pptbackgrounds.fsn… http://www.ravc.com http://www.rodssteak-… http://www.scms.ca http://www.simplysiestakey.com http://www.sportsstats.com http://www.supersprings.com http://www.thebeverlyhillscouri… http://www.tombraidermovie.com http://www.ulqini.de http://www.virtualict.com http://www.whipnspur.com http://www.yardleylondon.com URLs Click Count
  • 11. Affinity Graph Representation Once the edge weight is computed, for each query q(i) we built a main dictionay having key = q(i) and value equal to an ordered dictionary. The ordered dictionary has keys equals to the queries sharing at least 1 URL with q(i) and values equal to w(i,j). The main dictionary is used to feed the query suggestion API and provide a reliable result in milliseconds.
  • 12. Demo for those who can’t enjoy it the LIVE one 