SlideShare a Scribd company logo
Everything
You Wish You Knew
About Search
About
• Enterprise software company that develops products for software
developers, project managers, and content management
• Enterprise software company that develops products for software
developers, project managers, and content management
• Our products:
About
About Me
Head of Search & Smarts Engineering at Atlassian
• In charge of all customer-facing ML/AI initiatives, including Search
• Our main initiative is Cross-Product Search in ‘Home’
Before Atlassian:
• Particle Physicist by training
• Initiated Data Science efforts at several companies
• Previously member of the Search team at @WalmartLabs
About this Talk
What to expect
• A general introduction to Search
• A overview of both the Engineering and ML aspects of Search
• Insights into the current and future challenges of Search
What not to expect
• An extensive tutorial covering the entire Learning-to-Rank landscape
• To become a Search expert in 40 min
Outline
• Part I: The Concepts of Search
• Part II: The Technical Aspects of Search
• Part III: Learning Algorithms
• Part IV: Measuring Search Relevance
• Part V: The Challenges and the Future of Search
Part I: The Concepts of Search
Altavista
First to allow NL queries
Web Crawler
1st crawler to index entire pages
The (Pre)History of Search
1990
Archie
First search engine: an index of
downloadable directory listings 1991
Veronika, Jughead
Search file names and titles stored
in Gopher index systems 1992
Vlib
Time Berners-Lee set
up a Virtual Library
1993
Excite
WWW Wanderer
Primitive Web Search1994
1995
LookSmart
1996
Inktomi: HotBot
Google
1997
Ask.com
Lycos
Ranked relevance retrieval
Yahoo! Directory
The History of Search
1998 MSN
Open Directory Project
1999AllTheWeb
Overture Services
2000
Snap
2003
2004
2001
2002
2005
2006
LiveSearch
2007
2008
2009
Cuil
Bing
Inline search suggestions
2010
What is Search?
Convert an intent into an action that helps people
retrieve something, i.e. a piece of content
CONTENT OVERLOAD
Search
What is Search?
Convert an intent into an action that helps people
retrieve something, i.e. a piece of content
CONTENT OVERLOAD
Search
• Retrieving, organizing & classifying information
• Includes:
• Web Search
• Faceted Search (e-Commerce)
• Enterprise Search
• But also
• Different types of documents: Image Search, etc.
• In a wider sense of the term:
• Recommendation (Search with no explicit intent from the user)
• Structured Query Language
User Intent
What is Search (Really) About?
Users
User Intent
What is Search (Really) About?
Users
Content
Documents
User Intent
What is Search (Really) About?
Users
Content
Request
Search Query
Return
Search Results
Documents
INTERPRETATION
DISPLAY
RETRIEVAL
User 1 - Intent
What is Search (Really) About?
Users
Content
Request Search Query
Return Search Results
Documents 1
INTERPRETATION
DISPLAY
RETRIEVAL
• Query space not controlled
• Content dependent on customer
Multi-tenancy Search
User 2 - Intent
User 3 - Intent
Documents 2
Documents 3
Request Search Query
Return Search Results
Request Search Query
Return Search Results
DISPLAY
INTERPRETATION
DISPLAY
INTERPRETATION
Query data
• What are you searching for? (query terms)
Content data
• What are the documents about? (topics)
Contextual data
• Who are you? (user data – both static and learned)
• In which circumstances are you searching?
Engagement data
• As a group (what web pages are ‘hot’ these days?)
• As an individual (your personal viewing history)
Data Zoo For Search
CRAWLER
strips out the html text content
The Processes of Search
Automated browser
that views your web pages
CRAWLER
INDEXER
strips out the html text content
Stores records of all pages viewed by
the spider/crawler
The Processes of Search
Automated browser
that views your web pages
Database being searched
when ‘search’ button is hit
CRAWLER
INDEXER
SEARCHER
strips out the html text content
Stores records of all pages viewed by
the spider/crawler
Algorithm used to sort through
the database of pages
The Processes of Search
Automated browser
that views your web pages
Database being searched
when ‘search’ button is hit
finds the most relevant content
Part II: The Technical Aspects of Search
Search Engine Architecture
Crawler
Document
Analyzer
Indexer
Indexed corpus
Document
Representation
Index
Ranking procedure
Ranker
Feedback
Results
Query
representation
Query
Evaluation
User
Indexing
The purpose of storing an index is to optimize speed and performance in finding
relevant documents for a search query.
Indexing
• Without an index, the search engine would scan every document in the corpus
• Benefits: computation and time saving at query time
• 10,000 documents can be queried within milliseconds with an index
• a sequential scan could take hours
• Disadvantages:
• additional computer storage required to store the index
• increase in the time required for an update to take place
• Design factors:
• Storage techniques
• Index size, lookup speed
• Maintenance, fault tolerance
Indexing
The purpose of storing an index is to optimize speed and performance in finding
relevant documents for a search query.
Indexing
What Happens at Indexing Time?
Text + Metadata
(Doc type, structure, features)
Text Acquisition
Index
Takes index terms
& creates data structures
(inverted indexes)
to support fast searching
Transforms documents into
index terms or features
Document
data store
E-mail, Web pages, News
articles, Memos, Letters
Identifies and stores
documents for indexing
Indexing Process
Index Creation
Text Transformation
1. Identify What To Search For
Find out what words get searched and interpret the query term
2. Parse The Query Language Itself
Recognizing and interpreting operators (AND, OR, NOT, etc.) and field restrictors
3. Extend Search to Other Query Terms
This includes:
• Fuzzy Matching (spelling mistakes)
• Entity and Thematic Modeling (related words)
4. Relevance Ranking Improvements
… such as:
• boosting documents containing all of the terms close together (proximity weighting)
• boosting documents from trustworthy sources, reducing documents from unreliable sites
Parsing
Ranking
Ranking
Cats with sunglasses
Ranking
Cats with sunglasses
Just hanging out with
my sunglasses on
Am I cool or what?
Me with glasses just
because…
it makes me smart.
What I see right here is Jim
Belushi as a cat.
Along with the Blues Brothers behind.
You will never be as capable
of rocking shades…
quite as well as this feline friend.
Ranking
Relevance score ∈ 0,1
0.9
0.7
0.3
0.1
Cats with sunglasses
Just hanging out with
my sunglasses on
Am I cool or what?
Me with glasses just
because…
it makes me smart.
What I see right here is Jim
Belushi as a cat.
Along with the Blues Brothers behind.
You will never be as capable
of rocking shades…
quite as well as this feline friend.
𝑓 𝑞𝑢𝑒𝑟𝑦, 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡
Reranking
Allows to run a simple query (A) for matching documents and
re-order the top N documents using the scores from a more complex query (B)
Query Re-Ranking
Reranking
Allows to run a simple query (A) for matching documents and
re-order the top N documents using the scores from a more complex query (B)
Query Re-Ranking
0.9
0.7
0.3
0.1
Original
rank
Reranking
Allows to run a simple query (A) for matching documents and
re-order the top N documents using the scores from a more complex query (B)
Query Re-Ranking
0.9
0.7
0.3
0.1
TopNdocuments
Original
rank
Reranking
Allows to run a simple query (A) for matching documents and
re-order the top N documents using the scores from a more complex query (B)
Query Re-Ranking
0.9
0.7
0.3
0.1
TopNdocuments
Original
rank
1.0
0.9
0.5
Re-ranking
Boosting and Personalization
Boosting
Running a simple query (A) and modify the {query, document} relevance scores to
boost some content (for example, based on popularity, engagement, etc.)
Boosting and Personalization
Boosting
Running a simple query (A) and modify the {query, document} relevance scores to
boost some content (for example, based on popularity, engagement, etc.)
0.9
0.7
0.3
Original
relevance
Original
rank
Boosting and Personalization
Boosting
Running a simple query (A) and modify the {query, document} relevance scores to
boost some content (for example, based on popularity, engagement, etc.)
0.9
0.7
0.3
Original
relevance
Original
rank
2,000
5,000
6,000
Page
clicks
+ 𝛼	
  .
+ 𝛼	
  .
+ 𝛼	
  .
Boosting and Personalization
Boosting
Running a simple query (A) and modify the {query, document} relevance scores to
boost some content (for example, based on popularity, engagement, etc.)
0.9
0.7
0.3
Original
relevance
Original
rank
2,000
5,000
6,000
Page
clicks
+ 𝛼	
  .
+ 𝛼	
  .
+ 𝛼	
  .
Total dwell
time (minutes)
500
400
100
+ 𝛽.
+ 𝛽.
+ 𝛽.
Boosting and Personalization
Boosting
Running a simple query (A) and modify the {query, document} relevance scores to
boost some content (for example, based on popularity, engagement, etc.)
0.9
0.7
0.3
Original
relevance
Original
rank
2,000
5,000
6,000
Page
clicks
+ 𝛼	
  .
+ 𝛼	
  .
+ 𝛼	
  .
Total dwell
time (minutes)
500
400
100
+ 𝛽.
+ 𝛽.
+ 𝛽.
New
relevance
= 65.9
= 154.7
= 181.3
𝛼 = 0.03, 𝛽 = 0.01
New
rank
Part III: Learning Algorithms
Learning-to-Rank (1)
User Query
Top-k retrieval
Results page
Ranking model
Learning
algorithm
Training
data
Documents
Indexer
Index
Learning-to-Rank (2)
Learning
System
Ranking System
Model h
q
x1
x2
xm
h(x)
…
q
x1
x2
xm
?
…
q1
x1
(1)
x2
(1)
xm(1)
(1)
y
(1)
…
q2
x1
(2)
x2
(2)
xm(2)
(2)
y
(2)
…
qn
x1
(n)
x2
(n)
xm(n)
(n)
y
(n)
…
…
Training Data
Test Data Prediction
Pointwise
• Predict relevance on a document-by-document basis
• 3 types of supervised machine learning algorithms can be used:
• Regression-based algorithms
• Classification-based algorithms
• Ordinal regression
Learning-to-Rank Algorithms
Pointwise
• Predict relevance on a document-by-document basis
• 3 types of supervised machine learning algorithms can be used:
• Regression-based algorithms
• Classification-based algorithms
• Ordinal regression
Pairwise
• Tell which document is better in a given pair of documents: it is a classification
problem
• The goal is to minimize average number of inversions in ranking
Learning-to-Rank Algorithms
Pointwise
• Predict relevance on a document-by-document basis
• 3 types of supervised machine learning algorithms can be used:
• Regression-based algorithms
• Classification-based algorithms
• Ordinal regression
Pairwise
• Tell which document is better in a given pair of documents: it is a classification
problem
• The goal is to minimize average number of inversions in ranking
Listwise
• Directly optimize one of the ranking evaluation measures
Learning-to-Rank Algorithms
Pointwise Approach
• Predict the exact relevance degree of each document
• Assumes that each {query, document} pair has a numerical or ordinal score
• Input space contains the feature vector of every single document
• Can be approximated by a regression problem
• Ordinal regression:
• {query, document} relevance score can only take small, finite number of values
Pointwise Approach
Regression Classification Ordinal Regression
Input Space Single Documents yj
Output Space Real Values
Non-ordered
Categories
Ordinal Categories
Hypothesis Space Scoring Function f(x)
Loss Function
Regression Loss Classification Loss
Ordinal Regression
Loss
L(f; xj, yj)
• Predict the exact relevance degree of each document
• Assumes that each {query, document} pair has a numerical or ordinal score
• Input space contains the feature vector of every single document
• Can be approximated by a regression problem
• Ordinal regression:
• {query, document} relevance score can only take small, finite number of values
Summary
• Focus on relative order between 2 documents instead of predicting relevance
• Learn a binary classifier to tell which document is better in a pair of documents
• Goal: minimize average number of inversions in ranking
• Pairwise preference is used as the ground truth
• Limitations:
• Does not differentiate inversions at top vs. bottom positions
• Examples:
• RankNet
Pairwise Algorithms
• Focus on relative order between 2 documents instead of predicting relevance
• Learn a binary classifier to tell which document is better in a pair of documents
• Goal: minimize average number of inversions in ranking
• Pairwise preference is used as the ground truth
• Limitations:
• Does not differentiate inversions at top vs. bottom positions
• Examples:
• RankNet
Pairwise Algorithms
Input Space Document pairs (xu, xv)
Output Space Preference 𝑦5,6 ∈ {+1, −1}
Hypothesis Space Preference function ℎ 𝑥5, 𝑥6 = 2. 𝐼{@ AB C@ AD } − 1
Loss Function Pairwise classification loss 𝐿(ℎ; 𝑥5, 𝑥6, 𝑦5,6)
Summary
• Pick an evaluation measure & optimize its value, averaged over all queries
• Challenges:
• Continuous approximations on measures used b/c most are not continuous functions
• 2 Types of approaches:
• Direct Optimization of IR Evaluation Measures
• Minimization of Listwise Ranking Losses
Listwise Algorithms
• Pick an evaluation measure & optimize its value, averaged over all queries
• Challenges:
• Continuous approximations on measures used b/c most are not continuous functions
• 2 Types of approaches:
• Direct Optimization of IR Evaluation Measures
• Minimization of Listwise Ranking Losses
Listwise Algorithms
Listwise Loss Minimization
Direct Optimization of IR
Measure
Input Space Document set 𝒙 =	
  {𝑥J}JKL
M
Output Space Permutation 𝜋O
Ordered Categories
𝒚 =	
  {𝑦J}JKL
M
Hypothesis Space ℎ 𝑥 = 𝑠𝑜𝑟𝑡 ∘ 𝑓(𝑥) ℎ 𝑥 = 𝑓(𝑥)
Loss Function Listwise Loss 𝐿(ℎ; 𝒙, 𝜋O)
1-surrogate Measure
𝐿(ℎ; 𝒙, 𝒚)
Summary
3 input ligands: C
Summary
B A
DifferentMethods
Pointwise Pairwise Listwise
C Score(C)
B Score(B)
A Score(A)
BA f(A)>f(B)
CB f(B)>f(C)
CA f(A)>f(C)
CBA PA,B,C
CB A PB,A,C
CB A PB,C,A
Output
Ranking = CBA
• Link analysis algorithm
Example: the PageRank Algorithm
• Algorithm invented by Larry Page (Google founder)
• score goes from 0 to 10
• Other Alternatives:
• Page Authority
• HostRank
• Voting Algorithms
• …
Graph-Based Algorithms
A
A
C
B
B
B
B
B
C
Features
Rank Features Rank Features
1 TF of body … …
2 TF of anchor 51 PageRank
3 TF of title 52 HostRank
4 TF of URL 53 Topical PageRank
5 TF of whole document 54 Topical HITS authority
6 IDF of body 55 Topical HITS hub
7 IDF of anchor 56 Inlink number
8 IDF of title 57 Outlink number
9 IDF of URL 58 Number of slash in URL
10 IDF of whole document 59 Length of URL
IR/NLPfeatures
LinkageEngagement
Example features (TREC)
TF: term frequency
IDF: inverse document frequency
Conventional Ranking Models
Query-dependent
• Boolean model, extended Boolean model, etc.
• Vector space model, latent semantic indexing (LSI), etc.
• BM25 model, statistical language model, etc.
Query-independent
• PageRank, TrustRank, BrowseRank, etc.
Problems with Conventional Models
• Manual parameter tuning difficult
• Too many parameters
• Evaluation measures not smooth
• Sometimes leads to overfitting
• Ensemble approach (combining models into a more effective one) not trivial
Part IV: Measuring Search Relevance
Corpus Size
• Number of pages indexed
Search engine overlap
• Fraction of pages indexed by engine A also indexed by engine B
Freshness
• Age of the pages in the index
Spam resilience
• Fraction of pages in index that are spam
Duplicates
• Number of unique pages in index
Search Engine Evaluation: Index
Search Engine Evaluation: Relevance Judgment
Types of judgments classified similarly to Ranking Algorithms
1. Degree of Relevance
• Binary: relevant vs. irrelevant
• Multiple ordered categories:
Perfect > Excellent > Good > Fair > Bad
2. Pairwise Preference
• Document A is more relevant than document B
3. Total Order
• Documents are ranked as {A,B,C,..} according to their relevance
Evaluation Measure – MAP & NDCG
Precision at position k for query q :
Average precision for query q :
𝑃@𝑘 =	
  
#	
  { 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡	
   𝑑 𝑜𝑐𝑠	
   𝑖 𝑛	
   𝑡 𝑜𝑝	
   𝑘	
   𝑟 𝑒𝑠𝑢𝑙𝑡𝑠}
𝑘
𝐴𝑃 =	
  
∑ 𝑃@𝑘. 𝑙^^
#	
  { 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡	
   𝑑 𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠}
NDCG at position n for query q :
𝑁𝐷𝐶𝐺@𝑘 =	
   𝑍^ 	
  e 𝐺 𝜋fL
𝑗 	
   𝜂(𝑗)
^
JKL
Normalized Cumulative
(Position)
Discounted
MAP & NDCG: Averaged over all queries
MAP NDCG
Gain
Evaluation Measure - Summary
Query-level: every query contributes equally to the measure
• Computed on documents associated with the same query
• Bounded for each query
• Averaged over all test queries
Position-based: rank position is explicitly used (weighting)
• Top-ranked objects more important
• Relative order vs. relevance score of each document
• Rank is a non-continuous, non-differentiable of scores
Part V: The Challenges
and the Future of Search
• Near duplicates and versioning
• More recently, “quoting” in-between websites
• Metadata and file formats
• Search across multiple sources
• How to merge several indexes?
• Challenges with latency?
• Security, Privacy, Regulations
The Challenges of Enterprise Search
• User Logs as Ground Truth
• A gold mine that has not been leveraged so far
• Implicit feedback
• Click-through rates, etc.
• Feature Engineering
• New Directions of Research
• Semi-supervised Ranking
• Transfer Ranking
Future Research
• While 20+ years old, Search is still hard
• But there are off-the-shelf solutions…
• A problem where ML can help (learning-to-rank space)
• Most promising algorithms use a listwise approach
• Very dynamic area of research
• But doing Search well requires more than Learning-to-Rank:
• Query Parsing, Topic modeling, etc.
• It is getting harder with ever more types of documents
Conclusions
Thank You for Your Attention!
• Learning-to-Rank for Information Retrieval, by Tie-Yan Liu
• Learning-to-Rank Tutorial, by Tie-Yan Liu
• The PageRank Model, by Ian Rogers
• Search is Hard, by Priyendra Deshwal
• Why Is Enterprise Search so Hard?, by Miles Kehoe
References

More Related Content

What's hot

Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013
Findwise
 
Search Engine Optimization Primer
Search Engine Optimization PrimerSearch Engine Optimization Primer
Search Engine Optimization Primer
Simobo
 
Social bookmarking-for-dummies
Social bookmarking-for-dummiesSocial bookmarking-for-dummies
Social bookmarking-for-dummiesAivil Vin
 
Seo the soul of web design Anand Saini
Seo the soul of web design Anand SainiSeo the soul of web design Anand Saini
Seo the soul of web design Anand SainiDr,Saini Anand
 
Seo Training By Anand Saini
Seo Training By Anand SainiSeo Training By Anand Saini
Seo Training By Anand SainiDr,Saini Anand
 
Maximizing Your SEO Results - June 2013
Maximizing Your SEO Results - June 2013Maximizing Your SEO Results - June 2013
Maximizing Your SEO Results - June 2013
Top Floor Technologies
 
The State of SEO: 2015 and Beyond!
The State of SEO: 2015 and Beyond!The State of SEO: 2015 and Beyond!
The State of SEO: 2015 and Beyond!
DFWSEM
 
Search Engine Optimisation: A High Level View
Search Engine Optimisation: A High Level ViewSearch Engine Optimisation: A High Level View
Search Engine Optimisation: A High Level View
justin spratt
 
SharePoint 2013 Search Topology and Optimization
SharePoint 2013 Search Topology and OptimizationSharePoint 2013 Search Topology and Optimization
SharePoint 2013 Search Topology and Optimization
Mike Maadarani
 
Lost in the Net: Navigating Search Engines
Lost in the Net:  Navigating Search EnginesLost in the Net:  Navigating Search Engines
Lost in the Net: Navigating Search EnginesJohan Koren
 
Advanced SEO Technoiques-2014
Advanced SEO Technoiques-2014Advanced SEO Technoiques-2014
Advanced SEO Technoiques-2014
VIJAYAKRISHNAN K
 
SEO (Search Engine Opimization) Digital Marketing
SEO (Search Engine Opimization) Digital MarketingSEO (Search Engine Opimization) Digital Marketing
SEO (Search Engine Opimization) Digital Marketing
IMM Graduate School
 
Search Engine Op[timization (Seo) bangla tutorial(2)
Search Engine Op[timization (Seo) bangla tutorial(2)Search Engine Op[timization (Seo) bangla tutorial(2)
Search Engine Op[timization (Seo) bangla tutorial(2)
Mohammad Juel Rana
 
SEO - How does it work, Why is it important, and why do we have to do it?
SEO - How does it work, Why is it important, and why do we have to do it?SEO - How does it work, Why is it important, and why do we have to do it?
SEO - How does it work, Why is it important, and why do we have to do it?
Joao da Costa
 
Easton Comerford Fall 2015 Eng 1301 Presentation
Easton Comerford Fall 2015 Eng 1301 PresentationEaston Comerford Fall 2015 Eng 1301 Presentation
Easton Comerford Fall 2015 Eng 1301 Presentation
jana1954
 
3 ½ Simple Ways to Improve SEO - Practical Ways to Rank Higher
3 ½ Simple Ways to Improve SEO - Practical Ways to Rank Higher3 ½ Simple Ways to Improve SEO - Practical Ways to Rank Higher
3 ½ Simple Ways to Improve SEO - Practical Ways to Rank Higher
Pardot
 
Social In SharePoint 2010
Social In SharePoint 2010Social In SharePoint 2010
Social In SharePoint 2010
Richard Harbridge
 

What's hot (20)

Seo basics
Seo basicsSeo basics
Seo basics
 
Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013
 
Search Engine Optimization Primer
Search Engine Optimization PrimerSearch Engine Optimization Primer
Search Engine Optimization Primer
 
Social bookmarking-for-dummies
Social bookmarking-for-dummiesSocial bookmarking-for-dummies
Social bookmarking-for-dummies
 
Seo the soul of web design Anand Saini
Seo the soul of web design Anand SainiSeo the soul of web design Anand Saini
Seo the soul of web design Anand Saini
 
Seo Training By Anand Saini
Seo Training By Anand SainiSeo Training By Anand Saini
Seo Training By Anand Saini
 
Maximizing Your SEO Results - June 2013
Maximizing Your SEO Results - June 2013Maximizing Your SEO Results - June 2013
Maximizing Your SEO Results - June 2013
 
The State of SEO: 2015 and Beyond!
The State of SEO: 2015 and Beyond!The State of SEO: 2015 and Beyond!
The State of SEO: 2015 and Beyond!
 
Search Engine Optimisation: A High Level View
Search Engine Optimisation: A High Level ViewSearch Engine Optimisation: A High Level View
Search Engine Optimisation: A High Level View
 
SharePoint 2013 Search Topology and Optimization
SharePoint 2013 Search Topology and OptimizationSharePoint 2013 Search Topology and Optimization
SharePoint 2013 Search Topology and Optimization
 
Lost in the Net: Navigating Search Engines
Lost in the Net:  Navigating Search EnginesLost in the Net:  Navigating Search Engines
Lost in the Net: Navigating Search Engines
 
Advanced SEO Technoiques-2014
Advanced SEO Technoiques-2014Advanced SEO Technoiques-2014
Advanced SEO Technoiques-2014
 
Search engines
Search enginesSearch engines
Search engines
 
Mkt 460 Week 6
Mkt 460 Week 6Mkt 460 Week 6
Mkt 460 Week 6
 
SEO (Search Engine Opimization) Digital Marketing
SEO (Search Engine Opimization) Digital MarketingSEO (Search Engine Opimization) Digital Marketing
SEO (Search Engine Opimization) Digital Marketing
 
Search Engine Op[timization (Seo) bangla tutorial(2)
Search Engine Op[timization (Seo) bangla tutorial(2)Search Engine Op[timization (Seo) bangla tutorial(2)
Search Engine Op[timization (Seo) bangla tutorial(2)
 
SEO - How does it work, Why is it important, and why do we have to do it?
SEO - How does it work, Why is it important, and why do we have to do it?SEO - How does it work, Why is it important, and why do we have to do it?
SEO - How does it work, Why is it important, and why do we have to do it?
 
Easton Comerford Fall 2015 Eng 1301 Presentation
Easton Comerford Fall 2015 Eng 1301 PresentationEaston Comerford Fall 2015 Eng 1301 Presentation
Easton Comerford Fall 2015 Eng 1301 Presentation
 
3 ½ Simple Ways to Improve SEO - Practical Ways to Rank Higher
3 ½ Simple Ways to Improve SEO - Practical Ways to Rank Higher3 ½ Simple Ways to Improve SEO - Practical Ways to Rank Higher
3 ½ Simple Ways to Improve SEO - Practical Ways to Rank Higher
 
Social In SharePoint 2010
Social In SharePoint 2010Social In SharePoint 2010
Social In SharePoint 2010
 

Similar to Everything You Wish You Knew About Search

Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint SummitSearch Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
Joel Oleson
 
Enterprise Search (re-Imagined)
Enterprise Search (re-Imagined)Enterprise Search (re-Imagined)
Enterprise Search (re-Imagined)
Maarten Visser
 
Planning Your Enterprise Search Strategy
Planning Your Enterprise Search StrategyPlanning Your Enterprise Search Strategy
Planning Your Enterprise Search Strategy
InnoTech
 
The evolution of Search spscinci
The evolution of Search spscinciThe evolution of Search spscinci
The evolution of Search spscinci
Johnny Lopez
 
week 8 Effective Searching on Internet.ppt
week 8 Effective Searching on Internet.pptweek 8 Effective Searching on Internet.ppt
week 8 Effective Searching on Internet.ppt
Mohamed960052
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
Findwise
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
How To Rank #1 On Google | How To Improve Google Ranking | SEO Tutorial For B...
How To Rank #1 On Google | How To Improve Google Ranking | SEO Tutorial For B...How To Rank #1 On Google | How To Improve Google Ranking | SEO Tutorial For B...
How To Rank #1 On Google | How To Improve Google Ranking | SEO Tutorial For B...
Simplilearn
 
Search engines by Gulshan K Maheshwari(QAU)
Search engines by Gulshan  K Maheshwari(QAU)Search engines by Gulshan  K Maheshwari(QAU)
Search engines by Gulshan K Maheshwari(QAU)
GulshanKumar368
 
WordPress SEO Basics - Melbourne WordPress Meetup
WordPress SEO Basics - Melbourne WordPress MeetupWordPress SEO Basics - Melbourne WordPress Meetup
WordPress SEO Basics - Melbourne WordPress Meetup
Chris Burgess
 
Pam goodrich and Joe Gelb - A Journey to Intelligent Content Delivery
Pam goodrich and Joe Gelb - A Journey to Intelligent Content DeliveryPam goodrich and Joe Gelb - A Journey to Intelligent Content Delivery
Pam goodrich and Joe Gelb - A Journey to Intelligent Content Delivery
LavaConConference
 
Key Success Factors for Enterprise Content Management
Key Success Factors for Enterprise Content ManagementKey Success Factors for Enterprise Content Management
Key Success Factors for Enterprise Content Management
Intlock Ltd.
 
SEO in the Age of Artificial Intelligence | How AI influences Search
SEO in the Age of Artificial Intelligence | How AI influences SearchSEO in the Age of Artificial Intelligence | How AI influences Search
SEO in the Age of Artificial Intelligence | How AI influences Search
Philipp Klöckner
 
Enterprise Search Strategy 101 at SEF2014 in Stockholm
Enterprise Search Strategy 101 at SEF2014 in StockholmEnterprise Search Strategy 101 at SEF2014 in Stockholm
Enterprise Search Strategy 101 at SEF2014 in Stockholm
Joel Oleson
 
SharePoint site admins leverage search
SharePoint site admins leverage searchSharePoint site admins leverage search
SharePoint site admins leverage search
C/D/H Technology Consultants
 
TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013
Avtex
 
Deep-Dive to Azure Search
Deep-Dive to Azure SearchDeep-Dive to Azure Search
Deep-Dive to Azure Search
Gunnar Peipman
 
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
IXIASOFT
 
DITA and SEO
DITA and SEODITA and SEO
DITA and SEO
IXIASOFT
 

Similar to Everything You Wish You Knew About Search (20)

Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint SummitSearch Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
 
Enterprise Search (re-Imagined)
Enterprise Search (re-Imagined)Enterprise Search (re-Imagined)
Enterprise Search (re-Imagined)
 
Planning Your Enterprise Search Strategy
Planning Your Enterprise Search StrategyPlanning Your Enterprise Search Strategy
Planning Your Enterprise Search Strategy
 
The evolution of Search spscinci
The evolution of Search spscinciThe evolution of Search spscinci
The evolution of Search spscinci
 
week 8 Effective Searching on Internet.ppt
week 8 Effective Searching on Internet.pptweek 8 Effective Searching on Internet.ppt
week 8 Effective Searching on Internet.ppt
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
How To Rank #1 On Google | How To Improve Google Ranking | SEO Tutorial For B...
How To Rank #1 On Google | How To Improve Google Ranking | SEO Tutorial For B...How To Rank #1 On Google | How To Improve Google Ranking | SEO Tutorial For B...
How To Rank #1 On Google | How To Improve Google Ranking | SEO Tutorial For B...
 
Search engines by Gulshan K Maheshwari(QAU)
Search engines by Gulshan  K Maheshwari(QAU)Search engines by Gulshan  K Maheshwari(QAU)
Search engines by Gulshan K Maheshwari(QAU)
 
WordPress SEO Basics - Melbourne WordPress Meetup
WordPress SEO Basics - Melbourne WordPress MeetupWordPress SEO Basics - Melbourne WordPress Meetup
WordPress SEO Basics - Melbourne WordPress Meetup
 
Pam goodrich and Joe Gelb - A Journey to Intelligent Content Delivery
Pam goodrich and Joe Gelb - A Journey to Intelligent Content DeliveryPam goodrich and Joe Gelb - A Journey to Intelligent Content Delivery
Pam goodrich and Joe Gelb - A Journey to Intelligent Content Delivery
 
Key Success Factors for Enterprise Content Management
Key Success Factors for Enterprise Content ManagementKey Success Factors for Enterprise Content Management
Key Success Factors for Enterprise Content Management
 
SEO in the Age of Artificial Intelligence | How AI influences Search
SEO in the Age of Artificial Intelligence | How AI influences SearchSEO in the Age of Artificial Intelligence | How AI influences Search
SEO in the Age of Artificial Intelligence | How AI influences Search
 
Enterprise Search Strategy 101 at SEF2014 in Stockholm
Enterprise Search Strategy 101 at SEF2014 in StockholmEnterprise Search Strategy 101 at SEF2014 in Stockholm
Enterprise Search Strategy 101 at SEF2014 in Stockholm
 
Search engines
Search enginesSearch engines
Search engines
 
SharePoint site admins leverage search
SharePoint site admins leverage searchSharePoint site admins leverage search
SharePoint site admins leverage search
 
TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013
 
Deep-Dive to Azure Search
Deep-Dive to Azure SearchDeep-Dive to Azure Search
Deep-Dive to Azure Search
 
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
 
DITA and SEO
DITA and SEODITA and SEO
DITA and SEO
 

More from IDEAS - Int'l Data Engineering and Science Association

How to deliver effective data science projects
How to deliver effective data science projectsHow to deliver effective data science projects
How to deliver effective data science projects
IDEAS - Int'l Data Engineering and Science Association
 
Digital cracks in banking--Sid Nandi
Digital cracks in banking--Sid NandiDigital cracks in banking--Sid Nandi
Digital cracks in banking--Sid Nandi
IDEAS - Int'l Data Engineering and Science Association
 
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
IDEAS - Int'l Data Engineering and Science Association
 
Battling Skynet: The Role of Humanity in Artificial Intelligence
Battling Skynet: The Role of Humanity in Artificial IntelligenceBattling Skynet: The Role of Humanity in Artificial Intelligence
Battling Skynet: The Role of Humanity in Artificial Intelligence
IDEAS - Int'l Data Engineering and Science Association
 
Implementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big DataImplementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big Data
IDEAS - Int'l Data Engineering and Science Association
 
Data Architecture (i.e., normalization / relational algebra) and Database Sec...
Data Architecture (i.e., normalization / relational algebra) and Database Sec...Data Architecture (i.e., normalization / relational algebra) and Database Sec...
Data Architecture (i.e., normalization / relational algebra) and Database Sec...
IDEAS - Int'l Data Engineering and Science Association
 
Blockchain Application in Real Estate Transactions
Blockchain Application in Real Estate TransactionsBlockchain Application in Real Estate Transactions
Blockchain Application in Real Estate Transactions
IDEAS - Int'l Data Engineering and Science Association
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
IDEAS - Int'l Data Engineering and Science Association
 
Practical Machine Learning at Work
Practical Machine Learning at WorkPractical Machine Learning at Work
Practical Machine Learning at Work
IDEAS - Int'l Data Engineering and Science Association
 
Artificial Intelligence: Hype, Reality, Vision.
Artificial Intelligence: Hype, Reality, Vision.Artificial Intelligence: Hype, Reality, Vision.
Artificial Intelligence: Hype, Reality, Vision.
IDEAS - Int'l Data Engineering and Science Association
 
Operationalizing your Data Lake: Get Ready for Advanced Analytics
Operationalizing your Data Lake: Get Ready for Advanced AnalyticsOperationalizing your Data Lake: Get Ready for Advanced Analytics
Operationalizing your Data Lake: Get Ready for Advanced Analytics
IDEAS - Int'l Data Engineering and Science Association
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
IDEAS - Int'l Data Engineering and Science Association
 
Best Practices in Data Partnerships Between Mayor's Office and Academia
Best Practices in Data Partnerships Between Mayor's Office and AcademiaBest Practices in Data Partnerships Between Mayor's Office and Academia
Best Practices in Data Partnerships Between Mayor's Office and Academia
IDEAS - Int'l Data Engineering and Science Association
 
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
IDEAS - Int'l Data Engineering and Science Association
 
Data-Driven AI for Entertainment and Healthcare
Data-Driven AI for Entertainment and HealthcareData-Driven AI for Entertainment and Healthcare
Data-Driven AI for Entertainment and Healthcare
IDEAS - Int'l Data Engineering and Science Association
 
Generating Creative Works with AI
Generating Creative Works with AIGenerating Creative Works with AI
Using AI to Tackle the Future of Health Care Data
Using AI to Tackle the Future of Health Care DataUsing AI to Tackle the Future of Health Care Data
Using AI to Tackle the Future of Health Care Data
IDEAS - Int'l Data Engineering and Science Association
 
State of AI/ML in Real Estate
State of AI/ML in Real EstateState of AI/ML in Real Estate
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
IDEAS - Int'l Data Engineering and Science Association
 
Machine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life ScienceMachine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life Science
IDEAS - Int'l Data Engineering and Science Association
 

More from IDEAS - Int'l Data Engineering and Science Association (20)

How to deliver effective data science projects
How to deliver effective data science projectsHow to deliver effective data science projects
How to deliver effective data science projects
 
Digital cracks in banking--Sid Nandi
Digital cracks in banking--Sid NandiDigital cracks in banking--Sid Nandi
Digital cracks in banking--Sid Nandi
 
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
 
Battling Skynet: The Role of Humanity in Artificial Intelligence
Battling Skynet: The Role of Humanity in Artificial IntelligenceBattling Skynet: The Role of Humanity in Artificial Intelligence
Battling Skynet: The Role of Humanity in Artificial Intelligence
 
Implementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big DataImplementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big Data
 
Data Architecture (i.e., normalization / relational algebra) and Database Sec...
Data Architecture (i.e., normalization / relational algebra) and Database Sec...Data Architecture (i.e., normalization / relational algebra) and Database Sec...
Data Architecture (i.e., normalization / relational algebra) and Database Sec...
 
Blockchain Application in Real Estate Transactions
Blockchain Application in Real Estate TransactionsBlockchain Application in Real Estate Transactions
Blockchain Application in Real Estate Transactions
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
 
Practical Machine Learning at Work
Practical Machine Learning at WorkPractical Machine Learning at Work
Practical Machine Learning at Work
 
Artificial Intelligence: Hype, Reality, Vision.
Artificial Intelligence: Hype, Reality, Vision.Artificial Intelligence: Hype, Reality, Vision.
Artificial Intelligence: Hype, Reality, Vision.
 
Operationalizing your Data Lake: Get Ready for Advanced Analytics
Operationalizing your Data Lake: Get Ready for Advanced AnalyticsOperationalizing your Data Lake: Get Ready for Advanced Analytics
Operationalizing your Data Lake: Get Ready for Advanced Analytics
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Best Practices in Data Partnerships Between Mayor's Office and Academia
Best Practices in Data Partnerships Between Mayor's Office and AcademiaBest Practices in Data Partnerships Between Mayor's Office and Academia
Best Practices in Data Partnerships Between Mayor's Office and Academia
 
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
 
Data-Driven AI for Entertainment and Healthcare
Data-Driven AI for Entertainment and HealthcareData-Driven AI for Entertainment and Healthcare
Data-Driven AI for Entertainment and Healthcare
 
Generating Creative Works with AI
Generating Creative Works with AIGenerating Creative Works with AI
Generating Creative Works with AI
 
Using AI to Tackle the Future of Health Care Data
Using AI to Tackle the Future of Health Care DataUsing AI to Tackle the Future of Health Care Data
Using AI to Tackle the Future of Health Care Data
 
State of AI/ML in Real Estate
State of AI/ML in Real EstateState of AI/ML in Real Estate
State of AI/ML in Real Estate
 
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
 
Machine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life ScienceMachine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life Science
 

Recently uploaded

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Everything You Wish You Knew About Search

  • 1. Everything You Wish You Knew About Search
  • 2. About • Enterprise software company that develops products for software developers, project managers, and content management
  • 3. • Enterprise software company that develops products for software developers, project managers, and content management • Our products: About
  • 4. About Me Head of Search & Smarts Engineering at Atlassian • In charge of all customer-facing ML/AI initiatives, including Search • Our main initiative is Cross-Product Search in ‘Home’ Before Atlassian: • Particle Physicist by training • Initiated Data Science efforts at several companies • Previously member of the Search team at @WalmartLabs
  • 5. About this Talk What to expect • A general introduction to Search • A overview of both the Engineering and ML aspects of Search • Insights into the current and future challenges of Search What not to expect • An extensive tutorial covering the entire Learning-to-Rank landscape • To become a Search expert in 40 min
  • 6. Outline • Part I: The Concepts of Search • Part II: The Technical Aspects of Search • Part III: Learning Algorithms • Part IV: Measuring Search Relevance • Part V: The Challenges and the Future of Search
  • 7. Part I: The Concepts of Search
  • 8. Altavista First to allow NL queries Web Crawler 1st crawler to index entire pages The (Pre)History of Search 1990 Archie First search engine: an index of downloadable directory listings 1991 Veronika, Jughead Search file names and titles stored in Gopher index systems 1992 Vlib Time Berners-Lee set up a Virtual Library 1993 Excite WWW Wanderer Primitive Web Search1994 1995 LookSmart 1996 Inktomi: HotBot Google 1997 Ask.com Lycos Ranked relevance retrieval Yahoo! Directory
  • 9. The History of Search 1998 MSN Open Directory Project 1999AllTheWeb Overture Services 2000 Snap 2003 2004 2001 2002 2005 2006 LiveSearch 2007 2008 2009 Cuil Bing Inline search suggestions 2010
  • 10. What is Search? Convert an intent into an action that helps people retrieve something, i.e. a piece of content CONTENT OVERLOAD Search
  • 11. What is Search? Convert an intent into an action that helps people retrieve something, i.e. a piece of content CONTENT OVERLOAD Search • Retrieving, organizing & classifying information • Includes: • Web Search • Faceted Search (e-Commerce) • Enterprise Search • But also • Different types of documents: Image Search, etc. • In a wider sense of the term: • Recommendation (Search with no explicit intent from the user) • Structured Query Language
  • 12. User Intent What is Search (Really) About? Users
  • 13. User Intent What is Search (Really) About? Users Content Documents
  • 14. User Intent What is Search (Really) About? Users Content Request Search Query Return Search Results Documents INTERPRETATION DISPLAY RETRIEVAL
  • 15. User 1 - Intent What is Search (Really) About? Users Content Request Search Query Return Search Results Documents 1 INTERPRETATION DISPLAY RETRIEVAL • Query space not controlled • Content dependent on customer Multi-tenancy Search User 2 - Intent User 3 - Intent Documents 2 Documents 3 Request Search Query Return Search Results Request Search Query Return Search Results DISPLAY INTERPRETATION DISPLAY INTERPRETATION
  • 16. Query data • What are you searching for? (query terms) Content data • What are the documents about? (topics) Contextual data • Who are you? (user data – both static and learned) • In which circumstances are you searching? Engagement data • As a group (what web pages are ‘hot’ these days?) • As an individual (your personal viewing history) Data Zoo For Search
  • 17. CRAWLER strips out the html text content The Processes of Search Automated browser that views your web pages
  • 18. CRAWLER INDEXER strips out the html text content Stores records of all pages viewed by the spider/crawler The Processes of Search Automated browser that views your web pages Database being searched when ‘search’ button is hit
  • 19. CRAWLER INDEXER SEARCHER strips out the html text content Stores records of all pages viewed by the spider/crawler Algorithm used to sort through the database of pages The Processes of Search Automated browser that views your web pages Database being searched when ‘search’ button is hit finds the most relevant content
  • 20. Part II: The Technical Aspects of Search
  • 21. Search Engine Architecture Crawler Document Analyzer Indexer Indexed corpus Document Representation Index Ranking procedure Ranker Feedback Results Query representation Query Evaluation User
  • 22. Indexing The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Indexing
  • 23. • Without an index, the search engine would scan every document in the corpus • Benefits: computation and time saving at query time • 10,000 documents can be queried within milliseconds with an index • a sequential scan could take hours • Disadvantages: • additional computer storage required to store the index • increase in the time required for an update to take place • Design factors: • Storage techniques • Index size, lookup speed • Maintenance, fault tolerance Indexing The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Indexing
  • 24. What Happens at Indexing Time? Text + Metadata (Doc type, structure, features) Text Acquisition Index Takes index terms & creates data structures (inverted indexes) to support fast searching Transforms documents into index terms or features Document data store E-mail, Web pages, News articles, Memos, Letters Identifies and stores documents for indexing Indexing Process Index Creation Text Transformation
  • 25. 1. Identify What To Search For Find out what words get searched and interpret the query term 2. Parse The Query Language Itself Recognizing and interpreting operators (AND, OR, NOT, etc.) and field restrictors 3. Extend Search to Other Query Terms This includes: • Fuzzy Matching (spelling mistakes) • Entity and Thematic Modeling (related words) 4. Relevance Ranking Improvements … such as: • boosting documents containing all of the terms close together (proximity weighting) • boosting documents from trustworthy sources, reducing documents from unreliable sites Parsing
  • 28. Ranking Cats with sunglasses Just hanging out with my sunglasses on Am I cool or what? Me with glasses just because… it makes me smart. What I see right here is Jim Belushi as a cat. Along with the Blues Brothers behind. You will never be as capable of rocking shades… quite as well as this feline friend.
  • 29. Ranking Relevance score ∈ 0,1 0.9 0.7 0.3 0.1 Cats with sunglasses Just hanging out with my sunglasses on Am I cool or what? Me with glasses just because… it makes me smart. What I see right here is Jim Belushi as a cat. Along with the Blues Brothers behind. You will never be as capable of rocking shades… quite as well as this feline friend. 𝑓 𝑞𝑢𝑒𝑟𝑦, 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡
  • 30. Reranking Allows to run a simple query (A) for matching documents and re-order the top N documents using the scores from a more complex query (B) Query Re-Ranking
  • 31. Reranking Allows to run a simple query (A) for matching documents and re-order the top N documents using the scores from a more complex query (B) Query Re-Ranking 0.9 0.7 0.3 0.1 Original rank
  • 32. Reranking Allows to run a simple query (A) for matching documents and re-order the top N documents using the scores from a more complex query (B) Query Re-Ranking 0.9 0.7 0.3 0.1 TopNdocuments Original rank
  • 33. Reranking Allows to run a simple query (A) for matching documents and re-order the top N documents using the scores from a more complex query (B) Query Re-Ranking 0.9 0.7 0.3 0.1 TopNdocuments Original rank 1.0 0.9 0.5 Re-ranking
  • 34. Boosting and Personalization Boosting Running a simple query (A) and modify the {query, document} relevance scores to boost some content (for example, based on popularity, engagement, etc.)
  • 35. Boosting and Personalization Boosting Running a simple query (A) and modify the {query, document} relevance scores to boost some content (for example, based on popularity, engagement, etc.) 0.9 0.7 0.3 Original relevance Original rank
  • 36. Boosting and Personalization Boosting Running a simple query (A) and modify the {query, document} relevance scores to boost some content (for example, based on popularity, engagement, etc.) 0.9 0.7 0.3 Original relevance Original rank 2,000 5,000 6,000 Page clicks + 𝛼  . + 𝛼  . + 𝛼  .
  • 37. Boosting and Personalization Boosting Running a simple query (A) and modify the {query, document} relevance scores to boost some content (for example, based on popularity, engagement, etc.) 0.9 0.7 0.3 Original relevance Original rank 2,000 5,000 6,000 Page clicks + 𝛼  . + 𝛼  . + 𝛼  . Total dwell time (minutes) 500 400 100 + 𝛽. + 𝛽. + 𝛽.
  • 38. Boosting and Personalization Boosting Running a simple query (A) and modify the {query, document} relevance scores to boost some content (for example, based on popularity, engagement, etc.) 0.9 0.7 0.3 Original relevance Original rank 2,000 5,000 6,000 Page clicks + 𝛼  . + 𝛼  . + 𝛼  . Total dwell time (minutes) 500 400 100 + 𝛽. + 𝛽. + 𝛽. New relevance = 65.9 = 154.7 = 181.3 𝛼 = 0.03, 𝛽 = 0.01 New rank
  • 39. Part III: Learning Algorithms
  • 40. Learning-to-Rank (1) User Query Top-k retrieval Results page Ranking model Learning algorithm Training data Documents Indexer Index
  • 41. Learning-to-Rank (2) Learning System Ranking System Model h q x1 x2 xm h(x) … q x1 x2 xm ? … q1 x1 (1) x2 (1) xm(1) (1) y (1) … q2 x1 (2) x2 (2) xm(2) (2) y (2) … qn x1 (n) x2 (n) xm(n) (n) y (n) … … Training Data Test Data Prediction
  • 42. Pointwise • Predict relevance on a document-by-document basis • 3 types of supervised machine learning algorithms can be used: • Regression-based algorithms • Classification-based algorithms • Ordinal regression Learning-to-Rank Algorithms
  • 43. Pointwise • Predict relevance on a document-by-document basis • 3 types of supervised machine learning algorithms can be used: • Regression-based algorithms • Classification-based algorithms • Ordinal regression Pairwise • Tell which document is better in a given pair of documents: it is a classification problem • The goal is to minimize average number of inversions in ranking Learning-to-Rank Algorithms
  • 44. Pointwise • Predict relevance on a document-by-document basis • 3 types of supervised machine learning algorithms can be used: • Regression-based algorithms • Classification-based algorithms • Ordinal regression Pairwise • Tell which document is better in a given pair of documents: it is a classification problem • The goal is to minimize average number of inversions in ranking Listwise • Directly optimize one of the ranking evaluation measures Learning-to-Rank Algorithms
  • 45. Pointwise Approach • Predict the exact relevance degree of each document • Assumes that each {query, document} pair has a numerical or ordinal score • Input space contains the feature vector of every single document • Can be approximated by a regression problem • Ordinal regression: • {query, document} relevance score can only take small, finite number of values
  • 46. Pointwise Approach Regression Classification Ordinal Regression Input Space Single Documents yj Output Space Real Values Non-ordered Categories Ordinal Categories Hypothesis Space Scoring Function f(x) Loss Function Regression Loss Classification Loss Ordinal Regression Loss L(f; xj, yj) • Predict the exact relevance degree of each document • Assumes that each {query, document} pair has a numerical or ordinal score • Input space contains the feature vector of every single document • Can be approximated by a regression problem • Ordinal regression: • {query, document} relevance score can only take small, finite number of values Summary
  • 47. • Focus on relative order between 2 documents instead of predicting relevance • Learn a binary classifier to tell which document is better in a pair of documents • Goal: minimize average number of inversions in ranking • Pairwise preference is used as the ground truth • Limitations: • Does not differentiate inversions at top vs. bottom positions • Examples: • RankNet Pairwise Algorithms
  • 48. • Focus on relative order between 2 documents instead of predicting relevance • Learn a binary classifier to tell which document is better in a pair of documents • Goal: minimize average number of inversions in ranking • Pairwise preference is used as the ground truth • Limitations: • Does not differentiate inversions at top vs. bottom positions • Examples: • RankNet Pairwise Algorithms Input Space Document pairs (xu, xv) Output Space Preference 𝑦5,6 ∈ {+1, −1} Hypothesis Space Preference function ℎ 𝑥5, 𝑥6 = 2. 𝐼{@ AB C@ AD } − 1 Loss Function Pairwise classification loss 𝐿(ℎ; 𝑥5, 𝑥6, 𝑦5,6) Summary
  • 49. • Pick an evaluation measure & optimize its value, averaged over all queries • Challenges: • Continuous approximations on measures used b/c most are not continuous functions • 2 Types of approaches: • Direct Optimization of IR Evaluation Measures • Minimization of Listwise Ranking Losses Listwise Algorithms
  • 50. • Pick an evaluation measure & optimize its value, averaged over all queries • Challenges: • Continuous approximations on measures used b/c most are not continuous functions • 2 Types of approaches: • Direct Optimization of IR Evaluation Measures • Minimization of Listwise Ranking Losses Listwise Algorithms Listwise Loss Minimization Direct Optimization of IR Measure Input Space Document set 𝒙 =  {𝑥J}JKL M Output Space Permutation 𝜋O Ordered Categories 𝒚 =  {𝑦J}JKL M Hypothesis Space ℎ 𝑥 = 𝑠𝑜𝑟𝑡 ∘ 𝑓(𝑥) ℎ 𝑥 = 𝑓(𝑥) Loss Function Listwise Loss 𝐿(ℎ; 𝒙, 𝜋O) 1-surrogate Measure 𝐿(ℎ; 𝒙, 𝒚) Summary
  • 51. 3 input ligands: C Summary B A DifferentMethods Pointwise Pairwise Listwise C Score(C) B Score(B) A Score(A) BA f(A)>f(B) CB f(B)>f(C) CA f(A)>f(C) CBA PA,B,C CB A PB,A,C CB A PB,C,A Output Ranking = CBA
  • 52. • Link analysis algorithm Example: the PageRank Algorithm • Algorithm invented by Larry Page (Google founder) • score goes from 0 to 10 • Other Alternatives: • Page Authority • HostRank • Voting Algorithms • … Graph-Based Algorithms A A C B B B B B C
  • 53. Features Rank Features Rank Features 1 TF of body … … 2 TF of anchor 51 PageRank 3 TF of title 52 HostRank 4 TF of URL 53 Topical PageRank 5 TF of whole document 54 Topical HITS authority 6 IDF of body 55 Topical HITS hub 7 IDF of anchor 56 Inlink number 8 IDF of title 57 Outlink number 9 IDF of URL 58 Number of slash in URL 10 IDF of whole document 59 Length of URL IR/NLPfeatures LinkageEngagement Example features (TREC) TF: term frequency IDF: inverse document frequency
  • 54. Conventional Ranking Models Query-dependent • Boolean model, extended Boolean model, etc. • Vector space model, latent semantic indexing (LSI), etc. • BM25 model, statistical language model, etc. Query-independent • PageRank, TrustRank, BrowseRank, etc. Problems with Conventional Models • Manual parameter tuning difficult • Too many parameters • Evaluation measures not smooth • Sometimes leads to overfitting • Ensemble approach (combining models into a more effective one) not trivial
  • 55. Part IV: Measuring Search Relevance
  • 56. Corpus Size • Number of pages indexed Search engine overlap • Fraction of pages indexed by engine A also indexed by engine B Freshness • Age of the pages in the index Spam resilience • Fraction of pages in index that are spam Duplicates • Number of unique pages in index Search Engine Evaluation: Index
  • 57. Search Engine Evaluation: Relevance Judgment Types of judgments classified similarly to Ranking Algorithms 1. Degree of Relevance • Binary: relevant vs. irrelevant • Multiple ordered categories: Perfect > Excellent > Good > Fair > Bad 2. Pairwise Preference • Document A is more relevant than document B 3. Total Order • Documents are ranked as {A,B,C,..} according to their relevance
  • 58. Evaluation Measure – MAP & NDCG Precision at position k for query q : Average precision for query q : 𝑃@𝑘 =   #  { 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡   𝑑 𝑜𝑐𝑠   𝑖 𝑛   𝑡 𝑜𝑝   𝑘   𝑟 𝑒𝑠𝑢𝑙𝑡𝑠} 𝑘 𝐴𝑃 =   ∑ 𝑃@𝑘. 𝑙^^ #  { 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡   𝑑 𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠} NDCG at position n for query q : 𝑁𝐷𝐶𝐺@𝑘 =   𝑍^  e 𝐺 𝜋fL 𝑗   𝜂(𝑗) ^ JKL Normalized Cumulative (Position) Discounted MAP & NDCG: Averaged over all queries MAP NDCG Gain
  • 59. Evaluation Measure - Summary Query-level: every query contributes equally to the measure • Computed on documents associated with the same query • Bounded for each query • Averaged over all test queries Position-based: rank position is explicitly used (weighting) • Top-ranked objects more important • Relative order vs. relevance score of each document • Rank is a non-continuous, non-differentiable of scores
  • 60. Part V: The Challenges and the Future of Search
  • 61. • Near duplicates and versioning • More recently, “quoting” in-between websites • Metadata and file formats • Search across multiple sources • How to merge several indexes? • Challenges with latency? • Security, Privacy, Regulations The Challenges of Enterprise Search
  • 62. • User Logs as Ground Truth • A gold mine that has not been leveraged so far • Implicit feedback • Click-through rates, etc. • Feature Engineering • New Directions of Research • Semi-supervised Ranking • Transfer Ranking Future Research
  • 63. • While 20+ years old, Search is still hard • But there are off-the-shelf solutions… • A problem where ML can help (learning-to-rank space) • Most promising algorithms use a listwise approach • Very dynamic area of research • But doing Search well requires more than Learning-to-Rank: • Query Parsing, Topic modeling, etc. • It is getting harder with ever more types of documents Conclusions
  • 64. Thank You for Your Attention!
  • 65. • Learning-to-Rank for Information Retrieval, by Tie-Yan Liu • Learning-to-Rank Tutorial, by Tie-Yan Liu • The PageRank Model, by Ian Rogers • Search is Hard, by Priyendra Deshwal • Why Is Enterprise Search so Hard?, by Miles Kehoe References