SlideShare a Scribd company logo
1 of 21
Download to read offline
Learning to Rank 101
Pere Urbon-Bayes
ROME - APRIL 13/14 2018
About me
Pere Urbon - Bayes (Berliner since 2011)
Software Architect and Data Engineer
All about systems, data and teams
Open Source Advocate and Contributor
All will be available from
● github.com/purbon/learning_to_rank_101
● speakerdeck.com/purbon
Building a new
search
functionality
Building Search
A search engine is an information retrieval
system designed to help find information stored
on a computer system.
wikipedia.org/wiki/Search_engine_(computing)
Building Search
When search works, it can feel almost
magical: you simply type in what you’re looking
for and it’s served up in mere milliseconds. It’s
fast, convenient, and super efficient – no
wonder so many users prefer search over
clicking around the site’s categories!
Search, how does this works?
documents
D={d1
,d2
,...,dN
}
IR System
Query
q
List of documents (ranked)
dq,1
dq,2
dq,3
dq,4
dq,5
...
dq,n
Ranking based relevance
TF-IDF, BM25
Building search
The phases of building a search engine:
● Tokenization
○ synonyms (filter)
○ stop words (filter)
○ whitespace
○ ngram
● Analyzer
○ languages
○ keywords
○ standard
● Normalization
Indexing Time
Query Time
Tf-IDF
Term frequency - Inverse Document Frequency
Okapi BM25
Okapi search Best Matching 25 (BM25)
Others: PageRank, Learning to Rank, ….
The second line of defence
● Tags and Ontologies.
● Natural Language Processing.
● Result click tracking.
● Genetic and evolutionary methods to optimize boosting and weights.
● Build your own scorer
● ...
Scary and Complex!!!
Building great search (can be an art)
Learning to
Rank
Learning to Rank
The usage of machine learning (supervised, semi-supervised, …) to improve
the creation of ranking models for information retrieval.
Common applications are in search engines, collaborative filtering,
machine translation, biological computation, etc.
The idea was introduced in 1992 by Norbert Fuhr, describing learning in
information retrieval as a parameter estimation problem.
Learning to Rank, how does this works?
documents
D={d1
,d2
,...,dN
}
IR System
Query
qm+1
List of documents (ranked)
dq,1
, f(qm+1, d1)
dq,2,
f(qm+1, d1)
dq,3,
f(qm+1, d1)
dq,4,
f(qm+1, d1)
dq,5,
f(qm+1, d1)
...
dq,n,
f(qm+1, d1)
Learning
System
q1
d1,1
d1,2
d1,3
...
dq,n
qm
dm,1
dm,2
dm,3
...
dm,n
f(q,d
)
Learning to Rank
Algorithms can be divided in three different groups:
● Pointwise: If we assume that each pair (query, document) get a score,
then the problem can be approximated by a regression.
● Pairwise: In this case the problem is treated as a classification problem,
learning how to better classify each given pair of documents.
● Listwise: The last case try to optimize the value of one of previous
methods, averaged overall queries.
Order of quality: Listwise > Pairwise > Pointwise.
Learning to Rank
Most popular algorithms are:
● RankNet, LamdaRank, LamdaMart by Chris C.J Burges et others.
www.microsoft.com/en-us/research/publication/ranking-boosting-and-
model-adaptation/?from=http%3A%2F%2Fresearch.microsoft.com%2F
pubs%2F69536%2Ftr-2008-109.pdf
● RankSVM or (*) Gradient descendant variants.
Not only for the big companies.
References
Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul
Lamere.The Million Song Dataset. In Proceedings of the 12th International
Society for Music Information Retrieval Conference (ISMIR 2011), 2011.
Million Song Dataset, official website by Thierry Bertin-Mahieux,
available at: http://labrosa.ee.columbia.edu/millionsong/
Tie-Yan Liu (2009), "Learning to Rank for Information Retrieval",
Foundations and Trends in Information Retrieval, Foundations and Trends
in Information Retrieval, 3 (3): 225–331, doi:10.1561/1500000016, ISBN
978-1-60198-244-5.
Demo
Time….
Thank! Questions?
Pere Urbon Bayes — Data Wrangler
www.springernature.com
www.purbon.com

More Related Content

What's hot

Viva questions ds th c++
Viva questions ds th c++Viva questions ds th c++
Viva questions ds th c++
mrecedu
 
Bca2020 data structure and algorithm
Bca2020   data structure and algorithmBca2020   data structure and algorithm
Bca2020 data structure and algorithm
smumbahelp
 

What's hot (18)

Viva questions ds th c++
Viva questions ds th c++Viva questions ds th c++
Viva questions ds th c++
 
Make money fast! department of computer science-copypasteads.com
Make money fast!   department of computer science-copypasteads.comMake money fast!   department of computer science-copypasteads.com
Make money fast! department of computer science-copypasteads.com
 
Statistics in Data Science with Python
Statistics in Data Science with PythonStatistics in Data Science with Python
Statistics in Data Science with Python
 
Iterative deepening search
Iterative deepening searchIterative deepening search
Iterative deepening search
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
 
Lec3
Lec3Lec3
Lec3
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning Track
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
Data Applied: Clustering
Data Applied: ClusteringData Applied: Clustering
Data Applied: Clustering
 
Data Structures problems 2002
Data Structures problems 2002Data Structures problems 2002
Data Structures problems 2002
 
Snm Tauctv
Snm TauctvSnm Tauctv
Snm Tauctv
 
Python networkx library quick start guide
Python networkx library quick start guidePython networkx library quick start guide
Python networkx library quick start guide
 
R tools for HiC data visualization
R tools for HiC data visualizationR tools for HiC data visualization
R tools for HiC data visualization
 
Bca2020 data structure and algorithm
Bca2020   data structure and algorithmBca2020   data structure and algorithm
Bca2020 data structure and algorithm
 
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
 
XESLite - Handling Event Logs in ProM
XESLite - Handling Event Logs in ProMXESLite - Handling Event Logs in ProM
XESLite - Handling Event Logs in ProM
 
Tabu search
Tabu searchTabu search
Tabu search
 

Similar to Bringing personalisation to data discovery, Learning to Rank 101 by Pere Urbon-Bayes

Slides
SlidesSlides
Slides
butest
 
Query expansion_group42_ire
Query expansion_group42_ireQuery expansion_group42_ire
Query expansion_group42_ire
KovidaN
 
Query expansion_Team42_IRE2k14
Query expansion_Team42_IRE2k14Query expansion_Team42_IRE2k14
Query expansion_Team42_IRE2k14
sudhir11292rt
 

Similar to Bringing personalisation to data discovery, Learning to Rank 101 by Pere Urbon-Bayes (20)

Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Slides
SlidesSlides
Slides
 
Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...
Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...
Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
 
Query expansion_group42_ire
Query expansion_group42_ireQuery expansion_group42_ire
Query expansion_group42_ire
 
Algorithms.
Algorithms. Algorithms.
Algorithms.
 
2022_Fal-con_CQF_Presentation_Crowdstrike.pptx
2022_Fal-con_CQF_Presentation_Crowdstrike.pptx2022_Fal-con_CQF_Presentation_Crowdstrike.pptx
2022_Fal-con_CQF_Presentation_Crowdstrike.pptx
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
Data structures using C
Data structures using CData structures using C
Data structures using C
 
Ds12 140715025807-phpapp02
Ds12 140715025807-phpapp02Ds12 140715025807-phpapp02
Ds12 140715025807-phpapp02
 
Renaud bourassa building machine learning models with strict privacy boundaries
Renaud bourassa  building machine learning models with strict privacy boundariesRenaud bourassa  building machine learning models with strict privacy boundaries
Renaud bourassa building machine learning models with strict privacy boundaries
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
 
Query expansion_Team42_IRE2k14
Query expansion_Team42_IRE2k14Query expansion_Team42_IRE2k14
Query expansion_Team42_IRE2k14
 
KDD, Data Mining, Data Science_I.pptx
KDD, Data Mining, Data Science_I.pptxKDD, Data Mining, Data Science_I.pptx
KDD, Data Mining, Data Science_I.pptx
 
COMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERING
COMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERINGCOMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERING
COMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERING
 
Machine learning
Machine learningMachine learning
Machine learning
 
Turbocharge your data science with python and r
Turbocharge your data science with python and rTurbocharge your data science with python and r
Turbocharge your data science with python and r
 

More from Codemotion

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Bringing personalisation to data discovery, Learning to Rank 101 by Pere Urbon-Bayes

  • 1. Learning to Rank 101 Pere Urbon-Bayes ROME - APRIL 13/14 2018
  • 2. About me Pere Urbon - Bayes (Berliner since 2011) Software Architect and Data Engineer All about systems, data and teams Open Source Advocate and Contributor
  • 3. All will be available from ● github.com/purbon/learning_to_rank_101 ● speakerdeck.com/purbon
  • 5. Building Search A search engine is an information retrieval system designed to help find information stored on a computer system. wikipedia.org/wiki/Search_engine_(computing)
  • 6. Building Search When search works, it can feel almost magical: you simply type in what you’re looking for and it’s served up in mere milliseconds. It’s fast, convenient, and super efficient – no wonder so many users prefer search over clicking around the site’s categories!
  • 7. Search, how does this works? documents D={d1 ,d2 ,...,dN } IR System Query q List of documents (ranked) dq,1 dq,2 dq,3 dq,4 dq,5 ... dq,n Ranking based relevance TF-IDF, BM25
  • 8. Building search The phases of building a search engine: ● Tokenization ○ synonyms (filter) ○ stop words (filter) ○ whitespace ○ ngram ● Analyzer ○ languages ○ keywords ○ standard ● Normalization Indexing Time Query Time
  • 9. Tf-IDF Term frequency - Inverse Document Frequency
  • 10. Okapi BM25 Okapi search Best Matching 25 (BM25) Others: PageRank, Learning to Rank, ….
  • 11. The second line of defence ● Tags and Ontologies. ● Natural Language Processing. ● Result click tracking. ● Genetic and evolutionary methods to optimize boosting and weights. ● Build your own scorer ● ... Scary and Complex!!!
  • 12. Building great search (can be an art)
  • 14. Learning to Rank The usage of machine learning (supervised, semi-supervised, …) to improve the creation of ranking models for information retrieval. Common applications are in search engines, collaborative filtering, machine translation, biological computation, etc. The idea was introduced in 1992 by Norbert Fuhr, describing learning in information retrieval as a parameter estimation problem.
  • 15. Learning to Rank, how does this works? documents D={d1 ,d2 ,...,dN } IR System Query qm+1 List of documents (ranked) dq,1 , f(qm+1, d1) dq,2, f(qm+1, d1) dq,3, f(qm+1, d1) dq,4, f(qm+1, d1) dq,5, f(qm+1, d1) ... dq,n, f(qm+1, d1) Learning System q1 d1,1 d1,2 d1,3 ... dq,n qm dm,1 dm,2 dm,3 ... dm,n f(q,d )
  • 16. Learning to Rank Algorithms can be divided in three different groups: ● Pointwise: If we assume that each pair (query, document) get a score, then the problem can be approximated by a regression. ● Pairwise: In this case the problem is treated as a classification problem, learning how to better classify each given pair of documents. ● Listwise: The last case try to optimize the value of one of previous methods, averaged overall queries. Order of quality: Listwise > Pairwise > Pointwise.
  • 17. Learning to Rank Most popular algorithms are: ● RankNet, LamdaRank, LamdaMart by Chris C.J Burges et others. www.microsoft.com/en-us/research/publication/ranking-boosting-and- model-adaptation/?from=http%3A%2F%2Fresearch.microsoft.com%2F pubs%2F69536%2Ftr-2008-109.pdf ● RankSVM or (*) Gradient descendant variants.
  • 18. Not only for the big companies.
  • 19. References Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere.The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011. Million Song Dataset, official website by Thierry Bertin-Mahieux, available at: http://labrosa.ee.columbia.edu/millionsong/ Tie-Yan Liu (2009), "Learning to Rank for Information Retrieval", Foundations and Trends in Information Retrieval, Foundations and Trends in Information Retrieval, 3 (3): 225–331, doi:10.1561/1500000016, ISBN 978-1-60198-244-5.
  • 21. Thank! Questions? Pere Urbon Bayes — Data Wrangler www.springernature.com www.purbon.com