SlideShare a Scribd company logo
1 of 59
Beyond Learning to Rank
Additional Machine Learning Approaches to Improve Search Relevancy
2
ā€¢ Chief Data Scientist at Dice.com, under Yuri Bykov
ā€¢ Key Projects using search:
Who Am I?
ā€¢ Recommender Systems ā€“ more jobs like this, more seekers like
this (uses custom Solr index)
ā€¢ Custom Dice Solr MLT handler (real-time recommendations)
ā€¢ ā€˜Did you mean?ā€™ functionality
ā€¢ Title, skills and company type-ahead
ā€¢ Relevancy improvements in Dice jobs search
3
ā€¢ Supply Demand Analysis (see my blog post on this, with an
accompanying data visualization)
ā€¢ ā€œDice Careerā€ (new Dice mobile app releasing soon on the iOS
app store, coming soon to android)
ā€¢ Includes salary predictor
ā€¢ Dice Skills pages ā€“ http://www.dice.com/skills
Other Projects
PhD
ā€¢ PhD candidate at DePaul University, studying natural language processing and
machine learning
Relevancy Tuning
Strengths and Limitations of Different Approaches
Relevancy Tuning ā€“ Common Approaches
ā€¢ Gather a golden dataset of relevancy judgements
ā€¢ Use a team of analysts to evaluate the quality of search results for a set of common user queries
ā€¢ Mine the search logs, capturing which documents were clicked for each query, and how long the user
spent on the resulting webpage
ā€¢ Define some metric that measure what you want to optimize you search engine for (MAP, NDCG, etc)
ā€¢ Tune the parameters of the search engine to improve the performance on this dataset using either:
1. Manual Tuning
2. Grid Search ā€“ brute force search over a large list of parameters
3. Machine Learned Ranking (MLR) ā€“ train a machine learning model to re-rank the top n search results
ā€¢ This improves precision ā€“ by reducing the number of bad matches returned by the system
Manual Tuning
ļ‚§ Slow, manual task
ļ‚§ Not very objective or scientific, unless validated by computing metrics over a dataset of relevancy judgements
Grid Search
ļ‚§ Brute force search over all the search parameters to find the optimal configuration ā€“ naĆÆve, does not
intelligently explore the search space of parameters
ļ‚§ Very slow, needs to run 100ā€™s or 1000ā€™s of searches to test each set of parameters
Machine Learned Ranking
ā€¢ Uses a machine learning model, trained on a golden dataset, to re-rank the search results
ā€¢ Machine learning algorithms are too slow to rank all documents in moderate to large indexes
ā€¢ Typical approach is to take the top N documents returned by the search engine and re-rank them
ā€¢ What if those top N documents werenā€™t the most relevant?
ā€¢ 2 possibilities ā€“
1. Top N documents donā€™t contain all relevant documents (recall problem)
2. Top N documents contain some irrelevant documents (precision problem)
ā€¢ MLR systems can be prone to feedback loops where they influence their own training data
Talk Overview
This talk will cover the following 3 topics:
1. Conceptual Search
ā€¢ Solves the recall problem
2. How to Automatically Optimize the Search Engine Configuration
ā€¢ Improves the quality of the top search results, prior to re-ranking
3. The problem of feedback loops, and how to prevent them
Part 1: Conceptual Search
1
3
Q. What is the Most Common Relevancy Tuning Mistake?
1
4
A. Ignoring the importance of RECALL
Q. What is the Most Common Relevancy Tuning Mistake?
1
5
Relevancy Tuning
ā€¢ Key performance metrics to measure:
ā€¢ Precision
ā€¢ Recall
ā€¢ F1 Measure - 2*(P*R)/(P+R)
ā€¢ Precision is easier ā€“ correct mistakes in the top search results
ā€¢ Recall - need to know which relevant documents donā€™t come back
ā€¢ Hard to accurately measure
ā€¢ Need to know all the relevant documents present in the index
1
6
What is Conceptual Search?
ā€¢ A.K.A. Semantic Search
ā€¢ Two key challenges with keyword matching:
ā€¢ Polysemy: Words have more than one meaning
ā€¢ e.g. engineer ā€“ mechanical? programmer? automation engineer?
ā€¢ Synonymy: Many different words have the same meaning
ā€¢ e.g. QA, quality assurance, tester; VB, Visual Basic, VB.Net
ā€¢ Other related challenges -
ā€¢ Typos, Spelling Errors, Idioms
ā€¢ Conceptual search attempts to solve these problems by learning concepts
1
7
Why Conceptual Search?
ā€¢ We will attempt to improve recall without diminishing precision
ā€¢ Can match relevant documents containing none of the query terms
ā€¢ I will the popular Solr lucene-based search engine to illustrate how you can
implement conceptual search, but the techniques used can be applied to any
search engine
Solr
1
8
Concepts
ā€¢ Conceptual search allows us to retrieve documents by how similar the concepts
in the query are to the concepts in a document
ā€¢ Concepts represent important high-level ideas in a given domain (e.g. java
technologies, big data jobs, helpdesk support, etc)
ā€¢ Concepts are automatically learned from documents using machine learning
ā€¢ Words can belong to multiple concepts, with varying strengths of association with
each concept
1
9
Traditional Techniques
ā€¢ Many algorithms have been used for concept learning, include LSA (Latent
Semantic Analysis), LDA (Latent Dirichlet Allocation) and Word2Vec
ā€¢ All involve mapping a document to a low dimensional dense vector (an array of
numbers)
ā€¢ Each element of the vector is a number representing how well the document
represents that concept
ā€¢ E.g. LSA powers the similar skills found in diceā€™s skills pages
ā€¢ See From Frequency to Meaning: Vector Space Models of Semantics for more
information on traditional vector space models of word meaning
2
0
Traditional Techniques Donā€™t Scale
ā€¢ LSALSI, LDA and related techniques rely on factorization of very large term-
document matrices ā€“ very slow and computationally intensive
ā€¢ Require embedding a machine learning model within the search engine to map
new queries to the concept space (latent or topic space)
ā€¢ Query performance is very poor ā€“ unable to utilize the inverted index as all
documents have the same number of concepts
ā€¢ What we want is a way to map words not documents to concepts. Then we can
embed this in Solr via synonym filters and custom query parsers
2
1
Word2Vec and ā€˜Word Mathā€™
ā€¢ Word2Vec was developed by google around 2013 for learning vector
representations for words, building on earlier work from Rumelhart, Hinton and
Williams in 1986 (see paper below for citation of this work)
ā€¢ Word2Vec Paper: Efficient Estimation of Word Representations in Vector
Space
ā€¢ It works by training a machine learning model to predict the words surrounding
a word in a sentence
ā€¢ Similar words get similar vector representations
ā€¢ Scales well to very large datasets - no matrix factorization
2
2
ā€œWord Mathā€ Example
ā€¢ Using basic vector arithmetic, you get some interesting patterns
ā€¢ This illustrates how it represents relationships between words
ā€¢ E.g. man ā€“ king + woman = queen
2
3
The algorithm learns to
represent different types of
relationships between
words in vector form
2
4
Why Do I Care? This is a Search Meetupā€¦
2
5
Why Do I Care? This is a Search Meetupā€¦
ā€¢ This algorithm can be used to represent documents as vectors of concepts
ā€¢ We can them use these representations to do conceptual search
ā€¢ This will surface many relevant documents missed by keyword matching
ā€¢ This boosts recall
ā€¢ This technique can also be used to automatically learn synonyms
2
6
A Quick Demo
Using our Dice active jobs index, some example common user queries:
ā€¢ Data Scientist
ā€¢ Big Data
ā€¢ Information Retrieval
ā€¢ C#
ā€¢ Web Developer
ā€¢ CTO
ā€¢ Project Manager
Note: All matching documents would NOT be returned by keyword matching
2
7
How?
GitHub- DiceTechJobs/ConceptualSearch:
1. Pre-Process documents ā€“ parse html, strip noise characters, tokenize words
2. Define important keywords for your domain, or use my code to auto extract top terms
and phrases
3. Train Word2Vec model on documents to produce a word2vec model
4. Using this model, either:
1. Vectors: Use the raw vectors and embed them in Solr using synonyms + payloads
2. Top N Similar: Or extract the top n similar terms with similarities and embed these as weighted
synonyms using my custom queryboost parser and tokenizer
3. Clusters: Cluster these vectors by similarity, and map terms to clusters in a synonym file
2
8
Define Top Domain Specific Keywords
ā€¢ If you have a set of documents belonging to a specific domain, it is important
to define the important keywords for your domain:
ā€¢ Use top few thousand search keywords
ā€¢ Or use my fast keyword and phrase extraction tool (in GitHub)
ā€¢ Or use a shingle filter to extract top 1 - 4 word sequences (ngrams) by
document frequency
ā€¢ Important to map common phrases to single tokens, e.g. data scientist =>
data_scientist, java developer=>java_developer
2
9
Do It Yourself
ā€¢ All code for this talk is now publicly available on GitHub:
ā€¢ https://github.com/DiceTechJobs/SolrPlugins - Solr plugins to work with
conceptual search, and other dice plugins, such as a custom MLT handler
ā€¢ https://github.com/DiceTechJobs/SolrConfigExamples - Examples of Solr
configuration file entries to enable conceptual search and other Dice plugins:
ā€¢ https://github.com/DiceTechJobs/ConceptualSearch - Python code to
compute the Word2Vec word vectors, and generate Solr synonym files
3
0
Some Solr Tricks to Make this Happen
1. Keyword Extraction: Use the synonym filter to extract key words from your
documents
3
1
Some Solr Tricks to Make this Happen
1. Keyword Extraction: Use the synonym filter to extract key words from your
documents
ā€¢ Alternatively (for Solr) use the SolrTextTagger (CareerBuilder use this)
ā€¢ Both the synonym filter and the SolrTextTagger use a finite state transducer
ā€¢ This gives you a very naĆÆve but also very fast way to extract entities from text
ā€¢ Uses a greedy algorithm ā€“ if there are multiple possible matches, takes the
one with the most tokens
ā€¢ Recommended approach for fast entity extraction. If you need a more
accurate approach, train a Named Entity Recognition model
3
2
3
3
Some Solr Tricks to Make this Happen
1. Keyword Extraction: Use the synonym filter to extract key words from your
documents
2. Synonym Expansion using Payloads:
ā€¢ Use the synonym filter to expand a keyword to multiple tokens
ā€¢ Each token has an associated payload ā€“ used to adjust relevancy scores at
index or query time
ā€¢ If we do this at query time, it can be considered query term expansion, using
word vector similarity to determine the boosts of the related terms
3
4
3
5
Synonym File Examples ā€“ Vector Method (1)
ā€¢ Each keyword maps to a set of tokens via a synonym file
ā€¢ Example Vector Synonym file entry (5 element vector, usually 100+ elements):
ā€¢ java developer=>001|0.4 002|0.1 003|0.5 005|.9
ā€¢ Uses a custom token filter that averages these vectors over the entire
document (see GitHub - DiceTechJobs/SolrPlugins)
ā€¢ Relatively fast at index time but some additional indexing overhead
ā€¢ Very slow to query
3
6
Synonym File Examples ā€“ Top N Method (2)
ā€¢ Each keyword maps to a set of most similar keywords via a synonym file
ā€¢ Top N Synonym file entry (top 5):
ā€¢ java_developer=>java_j2ee_developer|0.907526 java_architect|0.889903
lead_java_developer|0.867594 j2ee_developer|0.864028 java_engineer|0.861407
ā€¢ Can configure this in solr at index time with payloads, a payload aware query parser
and a payload similarity function
ā€¢ Or you can configure this at query time with a special token filter that converts
payloads into term boosts, along with a special parser (see GitHub -
DiceTechJobs/SolrPlugins)
ā€¢ Fast at index and query time if N is reasonable (10-30)
3
7
Searching over Clustered Terms
ā€¢ After we have learned word vectors, we can use a clustering algorithm to
cluster terms by their vectors to give clusters of related words
ā€¢ Each keyword is mapped to itā€™s cluster, and matching occurs between clusters
ā€¢ Can learn several different sizes of cluster, such as 500, 1000, 5000 clusters
ā€¢ Apply stronger boosts to fields with smaller clusters (e.g. the 5000 cluster field)
using the edismax qf parameter - tighter clusters get more weight
ā€¢ Code for clustering vectors in GitHub - DiceTechJobs/ConceptualSearch
3
8
Synonym File Examples ā€“ Clustering Method (3)
ā€¢ Each keyword in a cluster maps to the same artificial token for that cluster
ā€¢ Cluster Synonym file entries:
ā€¢ java=>cluster_171
ā€¢ java applications=>cluster_171
ā€¢ java coding=>cluster_171
ā€¢ java design=>cluster_171
ā€¢ Doesnā€™t use payloads so does not require any special plugins
ā€¢ No noticeable impact on query or indexing performance
3
9
Example Clusters Learned from Dice Job Postings
ā€¢ Note: Labels in bold are manually assigned for interpretability:
ā€¢ Natural Languages: bi lingual, bilingual, chinese, fluent, french, german, japanese,
korean, lingual, localized, portuguese, russian, spanish, speak, speaker
ā€¢ Apple Programming Languages: cocoa, swift
ā€¢ Search Engine Technologies: apache solr, elasticsearch, lucene, lucene solr,
search, search engines, search technologies, solr, solr lucene
ā€¢ Microsoft .Net Technologies: c# wcf, microsoft c#, microsoft.net, mvc web, wcf web
services, web forms, webforms, windows forms, winforms, wpf wcf
4
0
Example Clusters Learned from Dice Job Postings
AttentionAttitude:
attention, attentive, close attention, compromising, conscientious, conscious, customer oriented,
customer service focus, customer service oriented, deliver results, delivering results,
demonstrated commitment, dependability, dependable, detailed oriented, diligence, diligent, do
attitude, ethic, excellent follow, extremely detail oriented, good attention, meticulous, meticulous
attention, organized, orientated, outgoing, outstanding customer service, pay attention,
personality, pleasant, positive attitude, professional appearance, professional attitude, professional
demeanor, punctual, punctuality, self motivated, self motivation, superb, superior, thoroughness
4
1
Ideas for Future Work
ā€¢ Use a machine learning algorithm to learn a sparse representation of the word vectors
ā€¢ Use a word vectors derived from a word-context matrix instead of word2vec ā€“ gives a sparse
vector representation of each word (e.g. HAL ā€“ Hyperspace Analogue of Language)
ā€¢ Incorporate several different vector space models that focus on different aspects of word
meaning, such as Dependency-Based Word Embeddings (uses grammatical relationships)
and the HAL model mentioned (weights terms by their proximity in the local context).
4
2
Conceptual Search Summary
ā€¢ Itā€™s easy to overlook recall when performing relevancy tuning
ā€¢ Conceptual search improves recall while maintaining high precision by matching documents
on concepts or ideas.
ā€¢ In reality this involves learning which terms are related to one another
ā€¢ Word2Vec is a scalable algorithm for learning related words from a set of documents, that
gives state of the art results in word analogy tasks
ā€¢ We can train a Word2Vec model offline, and embed itā€™s output into Solr by using the in-built
synonym filter and payload functionality, combined with some custom plugins
ā€¢ Video of my Lucene Revolution 2015 Conceptual Search Talk
Part 2 ā€“ How to Automatically Optimize the Search
Engine Configuration
Search Engine Configuration
ā€¢Modern search engines contain a lot of parameters that can impact the relevancy of the results
ā€¢These tend to fall into 2 main areas:
1. Query Configuration Parameters
ā€¢Control how the search engine generates queries and computes relevancy scores
ā€¢What boosts to specify per field? What similarity algorithm to use? TF IDF, BM25, custom?
ā€¢Disable length normalization on some fields?
ā€¢Boosting by attributes - how to boost by the age or location of the document
2. Index and Query Analysis Configuration
ā€¢Controls how documents and queries get tokenized for the purpose of matching
ā€¢Use stemming? Synonyms? Stop words? Ngrams?
How Do We Ensure the Optimal Configuration?
ā€¢Testing on a golden test set:
ā€¢Run the top few hundred or thousand queries and measure IR metrics
ā€¢Takes a long time to evaluate each configuration ā€“ minutes to hours
ā€¢Common Approaches:
ā€¢Manual Tuning?
ā€¢Slow (human in the loop) and subjective
ā€¢Grid Search ā€“ try every possible combination of settings?
ā€¢NaĆÆve ā€“ searches the entire grid, does not adjust its search based on its findings
ā€¢Slow test time limits how many configurations we can try
ā€¢Can we do better?
Can We Apply Machine Learning?
ā€¢We have a labelled dataset of queries with relevancy judgements
ā€¢Can we frame this as a supervised learning problem, and apply gradient descent?
ā€¢Noā€¦
ā€¢Popular IR metrics (MAP, NDCG, etc) are non-differentiable
ā€¢The set of query configuration and analysis settings cannot be optimized in this way as a
gradient is unavailable
Solution: Use a Black Box Optimization Algorithm
ā€¢Use an optimization algorithm to optimize a ā€˜black boxā€™ function
ā€¢Black box function ā€“ provide the optimization algorithm with a function that takes a set of
parameters as inputs and computes a score
ā€¢The black box algorithm will then try and choose parameter settings to optimize the score
ā€¢This can be thought of as a form of reinforcement learning
ā€¢These algorithms will intelligently search the space of possible search configurations to arrive at a
solution
Implementation
ā€¢The black box function:
ā€¢Inputs: ā€“ search engine configuration settings
ā€¢Output: - a score denoting the performance of that search engine configuration on the golden test
set
ā€¢Example Optimization Algorithms:
ā€¢Co-ordinate ascent
ā€¢Genetic algorithm
ā€¢Bayesian optimization
ā€¢Simulated annealing
Implementation
ā€¢The problem
ā€¢Optimize the configuration parameters of the search engine powering our content-based recommender
engine
ā€¢IR Metric to optimize
ā€¢Mean Average Precision at 5 documents, averaged over all recommender queries
ā€¢Train Test Split:
ā€¢80% training data, 20% test data
ā€¢Optimization Library :
ā€¢GPyOpt ā€“ open source Bayesian optimization library from the university of Sheffield, UK
ā€¢Result:
ā€¢13% improvement in the MAP @ 5 on the test dataset
Ideas for Future Work
ā€¢Learn a better ranking function for the search engine:
ā€¢Use genetic programming to evolve a ranking function that works better on our dataset
ā€¢Use metric learning algorithms (e.g. maximum margin nearest neighbors)
Part 3: Beware of Feedback Loops
Building a Machine Learning System
1. Users interact with the system to
produce data
2. Machine learning algorithms turn
that data into a model
Users Interact
with the System
Data
Model
Machine Learning
Produce
Building a Machine Learning System
1. Users interact with the system to
produce data
2. Machine learning algorithms turn
that data into a model
What happens if the modelā€™s
predictions influence the userā€™s
behavior?
Users Interact
with the System
Data
Model
Machine Learning
Produce
Feedback Loops
1. Users produce labelled data
2. Machine learning algorithms turn
that data into a model
3. Model changes user behavior,
modifying its own future training
data
Data
Model
Machine Learning
Produce
Model changes behavior
Users Interact
with the System
Direct Feedback Loops
ā€¢If the predictions of the machine learning system alter user behavior in such a way as to affect its own
training data, then you have a feedback loop
ā€¢Because the system is influencing its own training data, this can lead to gradual changes in the
system behavior over time
ā€¢This can lead to behavior that is hard to predict before the system is released into production
ā€¢Examples ā€“ recommender systems, search engines that use machine learned ranking models, route
finding systems, ad-targeting systems
Hidden Feedback Loops
ā€¢You can also get complex interactions when you have two separate machine learning models that
influence each other through the real world
ā€¢For example, improvements to a search engine might result in more purchases of certain products,
which in turn could influence a recommender system that uses item popularity as one of the features
used to generating recommendations
ā€¢Feedback loops, and many other challenges involved in building and maintaining machine learning
systems in the wild are covered in these two excellent papers from google:
ā€¢Machine Learning: The High-Interest Credit Card of Technical Debt
ā€¢Hidden Technical Debt in Machine Learning Systems
Preventing Feedback Loops
1. Isolate a subset of data from being influenced by the model, for use in training the system
ā€¢E.g. generate a subset of recommendations at random, or by using an unsupervised model
ā€¢E.g. leave a small proportion of user searches un-ranked by the MLR model
2. Use a reinforcement learning model instead (such as a multi-armed bandit) - the system will
dynamically adapt to the usersā€™ behavior, balancing exploring different hypotheses with exploiting
what itā€™s learned to produce accurate predictions
Summary
ā€¢To improve relevancy in a search engine, it is important first to gather a golden dataset of relevancy
judgements
ā€¢Machine learned ranking is an excellent approach for improving relevancy.
ā€¢However, it mainly focuses on improving precision not recall, and relies heavily on the quality of the
the top search results
ā€¢Conceptual search is one approach for improving the recall of the search engine
ā€¢Black box optimization algorithms can be used to auto-tune the search engine configuration to
improve relevancy
ā€¢Search and recommendation engines that use supervised machine learning models are prone to
feedback loops, which can lead to poor quality predictions
END OF TALK ā€“ Questions?

More Related Content

What's hot

Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
Ā 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
Ā 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Kai Li
Ā 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
Ā 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
Ā 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comLucidworks
Ā 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
Ā 
Question Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep LearningQuestion Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep LearningLucidworks
Ā 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelTrey Grainger
Ā 
Haystack London - Search Quality Evaluation, Tools and Techniques
Haystack London - Search Quality Evaluation, Tools and Techniques Haystack London - Search Quality Evaluation, Tools and Techniques
Haystack London - Search Quality Evaluation, Tools and Techniques Andrea Gazzarini
Ā 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphTrey Grainger
Ā 
How to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - HaystackHow to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - HaystackSease
Ā 
Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Andrea Gazzarini
Ā 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrasesCassandra Jacobs
Ā 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic rankingFELIX75
Ā 
Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain
Ā 
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachSearch Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachAlessandro Benedetti
Ā 
The power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringThe power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringSujit Pal
Ā 
Topic extraction using machine learning
Topic extraction using machine learningTopic extraction using machine learning
Topic extraction using machine learningSanjib Basak
Ā 
Combining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept LocationCombining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept LocationSonia Haiduc
Ā 

What's hot (20)

Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Ā 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Ā 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
Ā 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
Ā 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
Ā 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Ā 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Ā 
Question Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep LearningQuestion Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep Learning
Ā 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
Ā 
Haystack London - Search Quality Evaluation, Tools and Techniques
Haystack London - Search Quality Evaluation, Tools and Techniques Haystack London - Search Quality Evaluation, Tools and Techniques
Haystack London - Search Quality Evaluation, Tools and Techniques
Ā 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
Ā 
How to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - HaystackHow to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - Haystack
Ā 
Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)
Ā 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
Ā 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
Ā 
Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and Applications
Ā 
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachSearch Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Ā 
The power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringThe power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestring
Ā 
Topic extraction using machine learning
Topic extraction using machine learningTopic extraction using machine learning
Topic extraction using machine learning
Ā 
Combining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept LocationCombining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept Location
Ā 

Similar to Dice.com Bay Area Search - Beyond Learning to Rank Talk

Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
Ā 
RecSys 2015 Tutorial ā€“ Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial ā€“ Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial ā€“ Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial ā€“ Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
Ā 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Lucidworks
Ā 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingDataWorks Summit
Ā 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
Ā 
Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningLucidworks
Ā 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Lucidworks
Ā 
Final presentation
Final presentationFinal presentation
Final presentationNitish Upreti
Ā 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorialYiqun Liu
Ā 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...Aman Grover
Ā 
Software Design
Software DesignSoftware Design
Software DesignAhmed Misbah
Ā 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisCrowdFlower
Ā 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
Ā 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologiesenterprisesearchmeetup
Ā 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2GokulD
Ā 
ModelMine a tool to facilitate mining models from open source repositories pr...
ModelMine a tool to facilitate mining models from open source repositories pr...ModelMine a tool to facilitate mining models from open source repositories pr...
ModelMine a tool to facilitate mining models from open source repositories pr...Sayed Mohsin Reza
Ā 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchRachel Berryman
Ā 
Search enabled applications with lucene.net
Search enabled applications with lucene.netSearch enabled applications with lucene.net
Search enabled applications with lucene.netWillem Meints
Ā 
The recommendations system for source code components retrieval
The recommendations system for source code components retrievalThe recommendations system for source code components retrieval
The recommendations system for source code components retrievalAYESHA JAVED
Ā 

Similar to Dice.com Bay Area Search - Beyond Learning to Rank Talk (20)

Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Ā 
RecSys 2015 Tutorial ā€“ Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial ā€“ Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial ā€“ Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial ā€“ Scalable Recommender Systems: Where Machine Learning...
Ā 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Ā 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language Processing
Ā 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Ā 
Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep Learning
Ā 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Ā 
Final presentation
Final presentationFinal presentation
Final presentation
Ā 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorial
Ā 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
Ā 
Software Design
Software DesignSoftware Design
Software Design
Ā 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
Ā 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
Ā 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
Ā 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
Ā 
ModelMine a tool to facilitate mining models from open source repositories pr...
ModelMine a tool to facilitate mining models from open source repositories pr...ModelMine a tool to facilitate mining models from open source repositories pr...
ModelMine a tool to facilitate mining models from open source repositories pr...
Ā 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Ā 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
Ā 
Search enabled applications with lucene.net
Search enabled applications with lucene.netSearch enabled applications with lucene.net
Search enabled applications with lucene.net
Ā 
The recommendations system for source code components retrieval
The recommendations system for source code components retrievalThe recommendations system for source code components retrieval
The recommendations system for source code components retrieval
Ā 

Recently uploaded

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
Ā 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
Ā 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
Ā 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
Ā 
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...amitlee9823
Ā 
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...amitlee9823
Ā 
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...Delhi Call girls
Ā 
Call Girls Begur Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bangalore
Call Girls Begur Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service BangaloreCall Girls Begur Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bangalore
Call Girls Begur Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bangaloreamitlee9823
Ā 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
Ā 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
Ā 
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...amitlee9823
Ā 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
Ā 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
Ā 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
Ā 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
Ā 
Call Girls In Bellandur ā˜Ž 7737669865 šŸ„µ Book Your One night Stand
Call Girls In Bellandur ā˜Ž 7737669865 šŸ„µ Book Your One night StandCall Girls In Bellandur ā˜Ž 7737669865 šŸ„µ Book Your One night Stand
Call Girls In Bellandur ā˜Ž 7737669865 šŸ„µ Book Your One night Standamitlee9823
Ā 

Recently uploaded (20)

CHEAP Call Girls in Saket (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )šŸ” 9953056974šŸ”(=)/CALL GIRLS SERVICE
Ā 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
Ā 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Ā 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
Ā 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
Ā 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
Ā 
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ban...
Ā 
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Ā 
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Ā 
Call Girls Begur Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bangalore
Call Girls Begur Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service BangaloreCall Girls Begur Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bangalore
Call Girls Begur Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bangalore
Ā 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
Ā 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
Ā 
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Ā 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Ā 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
Ā 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Ā 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Ā 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Ā 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
Ā 
Call Girls In Bellandur ā˜Ž 7737669865 šŸ„µ Book Your One night Stand
Call Girls In Bellandur ā˜Ž 7737669865 šŸ„µ Book Your One night StandCall Girls In Bellandur ā˜Ž 7737669865 šŸ„µ Book Your One night Stand
Call Girls In Bellandur ā˜Ž 7737669865 šŸ„µ Book Your One night Stand
Ā 

Dice.com Bay Area Search - Beyond Learning to Rank Talk

  • 1. Beyond Learning to Rank Additional Machine Learning Approaches to Improve Search Relevancy
  • 2. 2 ā€¢ Chief Data Scientist at Dice.com, under Yuri Bykov ā€¢ Key Projects using search: Who Am I? ā€¢ Recommender Systems ā€“ more jobs like this, more seekers like this (uses custom Solr index) ā€¢ Custom Dice Solr MLT handler (real-time recommendations) ā€¢ ā€˜Did you mean?ā€™ functionality ā€¢ Title, skills and company type-ahead ā€¢ Relevancy improvements in Dice jobs search
  • 3. 3 ā€¢ Supply Demand Analysis (see my blog post on this, with an accompanying data visualization) ā€¢ ā€œDice Careerā€ (new Dice mobile app releasing soon on the iOS app store, coming soon to android) ā€¢ Includes salary predictor ā€¢ Dice Skills pages ā€“ http://www.dice.com/skills Other Projects PhD ā€¢ PhD candidate at DePaul University, studying natural language processing and machine learning
  • 4.
  • 5.
  • 6.
  • 7. Relevancy Tuning Strengths and Limitations of Different Approaches
  • 8. Relevancy Tuning ā€“ Common Approaches ā€¢ Gather a golden dataset of relevancy judgements ā€¢ Use a team of analysts to evaluate the quality of search results for a set of common user queries ā€¢ Mine the search logs, capturing which documents were clicked for each query, and how long the user spent on the resulting webpage ā€¢ Define some metric that measure what you want to optimize you search engine for (MAP, NDCG, etc) ā€¢ Tune the parameters of the search engine to improve the performance on this dataset using either: 1. Manual Tuning 2. Grid Search ā€“ brute force search over a large list of parameters 3. Machine Learned Ranking (MLR) ā€“ train a machine learning model to re-rank the top n search results ā€¢ This improves precision ā€“ by reducing the number of bad matches returned by the system
  • 9. Manual Tuning ļ‚§ Slow, manual task ļ‚§ Not very objective or scientific, unless validated by computing metrics over a dataset of relevancy judgements Grid Search ļ‚§ Brute force search over all the search parameters to find the optimal configuration ā€“ naĆÆve, does not intelligently explore the search space of parameters ļ‚§ Very slow, needs to run 100ā€™s or 1000ā€™s of searches to test each set of parameters
  • 10. Machine Learned Ranking ā€¢ Uses a machine learning model, trained on a golden dataset, to re-rank the search results ā€¢ Machine learning algorithms are too slow to rank all documents in moderate to large indexes ā€¢ Typical approach is to take the top N documents returned by the search engine and re-rank them ā€¢ What if those top N documents werenā€™t the most relevant? ā€¢ 2 possibilities ā€“ 1. Top N documents donā€™t contain all relevant documents (recall problem) 2. Top N documents contain some irrelevant documents (precision problem) ā€¢ MLR systems can be prone to feedback loops where they influence their own training data
  • 11. Talk Overview This talk will cover the following 3 topics: 1. Conceptual Search ā€¢ Solves the recall problem 2. How to Automatically Optimize the Search Engine Configuration ā€¢ Improves the quality of the top search results, prior to re-ranking 3. The problem of feedback loops, and how to prevent them
  • 13. 1 3 Q. What is the Most Common Relevancy Tuning Mistake?
  • 14. 1 4 A. Ignoring the importance of RECALL Q. What is the Most Common Relevancy Tuning Mistake?
  • 15. 1 5 Relevancy Tuning ā€¢ Key performance metrics to measure: ā€¢ Precision ā€¢ Recall ā€¢ F1 Measure - 2*(P*R)/(P+R) ā€¢ Precision is easier ā€“ correct mistakes in the top search results ā€¢ Recall - need to know which relevant documents donā€™t come back ā€¢ Hard to accurately measure ā€¢ Need to know all the relevant documents present in the index
  • 16. 1 6 What is Conceptual Search? ā€¢ A.K.A. Semantic Search ā€¢ Two key challenges with keyword matching: ā€¢ Polysemy: Words have more than one meaning ā€¢ e.g. engineer ā€“ mechanical? programmer? automation engineer? ā€¢ Synonymy: Many different words have the same meaning ā€¢ e.g. QA, quality assurance, tester; VB, Visual Basic, VB.Net ā€¢ Other related challenges - ā€¢ Typos, Spelling Errors, Idioms ā€¢ Conceptual search attempts to solve these problems by learning concepts
  • 17. 1 7 Why Conceptual Search? ā€¢ We will attempt to improve recall without diminishing precision ā€¢ Can match relevant documents containing none of the query terms ā€¢ I will the popular Solr lucene-based search engine to illustrate how you can implement conceptual search, but the techniques used can be applied to any search engine Solr
  • 18. 1 8 Concepts ā€¢ Conceptual search allows us to retrieve documents by how similar the concepts in the query are to the concepts in a document ā€¢ Concepts represent important high-level ideas in a given domain (e.g. java technologies, big data jobs, helpdesk support, etc) ā€¢ Concepts are automatically learned from documents using machine learning ā€¢ Words can belong to multiple concepts, with varying strengths of association with each concept
  • 19. 1 9 Traditional Techniques ā€¢ Many algorithms have been used for concept learning, include LSA (Latent Semantic Analysis), LDA (Latent Dirichlet Allocation) and Word2Vec ā€¢ All involve mapping a document to a low dimensional dense vector (an array of numbers) ā€¢ Each element of the vector is a number representing how well the document represents that concept ā€¢ E.g. LSA powers the similar skills found in diceā€™s skills pages ā€¢ See From Frequency to Meaning: Vector Space Models of Semantics for more information on traditional vector space models of word meaning
  • 20. 2 0 Traditional Techniques Donā€™t Scale ā€¢ LSALSI, LDA and related techniques rely on factorization of very large term- document matrices ā€“ very slow and computationally intensive ā€¢ Require embedding a machine learning model within the search engine to map new queries to the concept space (latent or topic space) ā€¢ Query performance is very poor ā€“ unable to utilize the inverted index as all documents have the same number of concepts ā€¢ What we want is a way to map words not documents to concepts. Then we can embed this in Solr via synonym filters and custom query parsers
  • 21. 2 1 Word2Vec and ā€˜Word Mathā€™ ā€¢ Word2Vec was developed by google around 2013 for learning vector representations for words, building on earlier work from Rumelhart, Hinton and Williams in 1986 (see paper below for citation of this work) ā€¢ Word2Vec Paper: Efficient Estimation of Word Representations in Vector Space ā€¢ It works by training a machine learning model to predict the words surrounding a word in a sentence ā€¢ Similar words get similar vector representations ā€¢ Scales well to very large datasets - no matrix factorization
  • 22. 2 2 ā€œWord Mathā€ Example ā€¢ Using basic vector arithmetic, you get some interesting patterns ā€¢ This illustrates how it represents relationships between words ā€¢ E.g. man ā€“ king + woman = queen
  • 23. 2 3 The algorithm learns to represent different types of relationships between words in vector form
  • 24. 2 4 Why Do I Care? This is a Search Meetupā€¦
  • 25. 2 5 Why Do I Care? This is a Search Meetupā€¦ ā€¢ This algorithm can be used to represent documents as vectors of concepts ā€¢ We can them use these representations to do conceptual search ā€¢ This will surface many relevant documents missed by keyword matching ā€¢ This boosts recall ā€¢ This technique can also be used to automatically learn synonyms
  • 26. 2 6 A Quick Demo Using our Dice active jobs index, some example common user queries: ā€¢ Data Scientist ā€¢ Big Data ā€¢ Information Retrieval ā€¢ C# ā€¢ Web Developer ā€¢ CTO ā€¢ Project Manager Note: All matching documents would NOT be returned by keyword matching
  • 27. 2 7 How? GitHub- DiceTechJobs/ConceptualSearch: 1. Pre-Process documents ā€“ parse html, strip noise characters, tokenize words 2. Define important keywords for your domain, or use my code to auto extract top terms and phrases 3. Train Word2Vec model on documents to produce a word2vec model 4. Using this model, either: 1. Vectors: Use the raw vectors and embed them in Solr using synonyms + payloads 2. Top N Similar: Or extract the top n similar terms with similarities and embed these as weighted synonyms using my custom queryboost parser and tokenizer 3. Clusters: Cluster these vectors by similarity, and map terms to clusters in a synonym file
  • 28. 2 8 Define Top Domain Specific Keywords ā€¢ If you have a set of documents belonging to a specific domain, it is important to define the important keywords for your domain: ā€¢ Use top few thousand search keywords ā€¢ Or use my fast keyword and phrase extraction tool (in GitHub) ā€¢ Or use a shingle filter to extract top 1 - 4 word sequences (ngrams) by document frequency ā€¢ Important to map common phrases to single tokens, e.g. data scientist => data_scientist, java developer=>java_developer
  • 29. 2 9 Do It Yourself ā€¢ All code for this talk is now publicly available on GitHub: ā€¢ https://github.com/DiceTechJobs/SolrPlugins - Solr plugins to work with conceptual search, and other dice plugins, such as a custom MLT handler ā€¢ https://github.com/DiceTechJobs/SolrConfigExamples - Examples of Solr configuration file entries to enable conceptual search and other Dice plugins: ā€¢ https://github.com/DiceTechJobs/ConceptualSearch - Python code to compute the Word2Vec word vectors, and generate Solr synonym files
  • 30. 3 0 Some Solr Tricks to Make this Happen 1. Keyword Extraction: Use the synonym filter to extract key words from your documents
  • 31. 3 1 Some Solr Tricks to Make this Happen 1. Keyword Extraction: Use the synonym filter to extract key words from your documents ā€¢ Alternatively (for Solr) use the SolrTextTagger (CareerBuilder use this) ā€¢ Both the synonym filter and the SolrTextTagger use a finite state transducer ā€¢ This gives you a very naĆÆve but also very fast way to extract entities from text ā€¢ Uses a greedy algorithm ā€“ if there are multiple possible matches, takes the one with the most tokens ā€¢ Recommended approach for fast entity extraction. If you need a more accurate approach, train a Named Entity Recognition model
  • 32. 3 2
  • 33. 3 3 Some Solr Tricks to Make this Happen 1. Keyword Extraction: Use the synonym filter to extract key words from your documents 2. Synonym Expansion using Payloads: ā€¢ Use the synonym filter to expand a keyword to multiple tokens ā€¢ Each token has an associated payload ā€“ used to adjust relevancy scores at index or query time ā€¢ If we do this at query time, it can be considered query term expansion, using word vector similarity to determine the boosts of the related terms
  • 34. 3 4
  • 35. 3 5 Synonym File Examples ā€“ Vector Method (1) ā€¢ Each keyword maps to a set of tokens via a synonym file ā€¢ Example Vector Synonym file entry (5 element vector, usually 100+ elements): ā€¢ java developer=>001|0.4 002|0.1 003|0.5 005|.9 ā€¢ Uses a custom token filter that averages these vectors over the entire document (see GitHub - DiceTechJobs/SolrPlugins) ā€¢ Relatively fast at index time but some additional indexing overhead ā€¢ Very slow to query
  • 36. 3 6 Synonym File Examples ā€“ Top N Method (2) ā€¢ Each keyword maps to a set of most similar keywords via a synonym file ā€¢ Top N Synonym file entry (top 5): ā€¢ java_developer=>java_j2ee_developer|0.907526 java_architect|0.889903 lead_java_developer|0.867594 j2ee_developer|0.864028 java_engineer|0.861407 ā€¢ Can configure this in solr at index time with payloads, a payload aware query parser and a payload similarity function ā€¢ Or you can configure this at query time with a special token filter that converts payloads into term boosts, along with a special parser (see GitHub - DiceTechJobs/SolrPlugins) ā€¢ Fast at index and query time if N is reasonable (10-30)
  • 37. 3 7 Searching over Clustered Terms ā€¢ After we have learned word vectors, we can use a clustering algorithm to cluster terms by their vectors to give clusters of related words ā€¢ Each keyword is mapped to itā€™s cluster, and matching occurs between clusters ā€¢ Can learn several different sizes of cluster, such as 500, 1000, 5000 clusters ā€¢ Apply stronger boosts to fields with smaller clusters (e.g. the 5000 cluster field) using the edismax qf parameter - tighter clusters get more weight ā€¢ Code for clustering vectors in GitHub - DiceTechJobs/ConceptualSearch
  • 38. 3 8 Synonym File Examples ā€“ Clustering Method (3) ā€¢ Each keyword in a cluster maps to the same artificial token for that cluster ā€¢ Cluster Synonym file entries: ā€¢ java=>cluster_171 ā€¢ java applications=>cluster_171 ā€¢ java coding=>cluster_171 ā€¢ java design=>cluster_171 ā€¢ Doesnā€™t use payloads so does not require any special plugins ā€¢ No noticeable impact on query or indexing performance
  • 39. 3 9 Example Clusters Learned from Dice Job Postings ā€¢ Note: Labels in bold are manually assigned for interpretability: ā€¢ Natural Languages: bi lingual, bilingual, chinese, fluent, french, german, japanese, korean, lingual, localized, portuguese, russian, spanish, speak, speaker ā€¢ Apple Programming Languages: cocoa, swift ā€¢ Search Engine Technologies: apache solr, elasticsearch, lucene, lucene solr, search, search engines, search technologies, solr, solr lucene ā€¢ Microsoft .Net Technologies: c# wcf, microsoft c#, microsoft.net, mvc web, wcf web services, web forms, webforms, windows forms, winforms, wpf wcf
  • 40. 4 0 Example Clusters Learned from Dice Job Postings AttentionAttitude: attention, attentive, close attention, compromising, conscientious, conscious, customer oriented, customer service focus, customer service oriented, deliver results, delivering results, demonstrated commitment, dependability, dependable, detailed oriented, diligence, diligent, do attitude, ethic, excellent follow, extremely detail oriented, good attention, meticulous, meticulous attention, organized, orientated, outgoing, outstanding customer service, pay attention, personality, pleasant, positive attitude, professional appearance, professional attitude, professional demeanor, punctual, punctuality, self motivated, self motivation, superb, superior, thoroughness
  • 41. 4 1 Ideas for Future Work ā€¢ Use a machine learning algorithm to learn a sparse representation of the word vectors ā€¢ Use a word vectors derived from a word-context matrix instead of word2vec ā€“ gives a sparse vector representation of each word (e.g. HAL ā€“ Hyperspace Analogue of Language) ā€¢ Incorporate several different vector space models that focus on different aspects of word meaning, such as Dependency-Based Word Embeddings (uses grammatical relationships) and the HAL model mentioned (weights terms by their proximity in the local context).
  • 42. 4 2 Conceptual Search Summary ā€¢ Itā€™s easy to overlook recall when performing relevancy tuning ā€¢ Conceptual search improves recall while maintaining high precision by matching documents on concepts or ideas. ā€¢ In reality this involves learning which terms are related to one another ā€¢ Word2Vec is a scalable algorithm for learning related words from a set of documents, that gives state of the art results in word analogy tasks ā€¢ We can train a Word2Vec model offline, and embed itā€™s output into Solr by using the in-built synonym filter and payload functionality, combined with some custom plugins ā€¢ Video of my Lucene Revolution 2015 Conceptual Search Talk
  • 43. Part 2 ā€“ How to Automatically Optimize the Search Engine Configuration
  • 44. Search Engine Configuration ā€¢Modern search engines contain a lot of parameters that can impact the relevancy of the results ā€¢These tend to fall into 2 main areas: 1. Query Configuration Parameters ā€¢Control how the search engine generates queries and computes relevancy scores ā€¢What boosts to specify per field? What similarity algorithm to use? TF IDF, BM25, custom? ā€¢Disable length normalization on some fields? ā€¢Boosting by attributes - how to boost by the age or location of the document 2. Index and Query Analysis Configuration ā€¢Controls how documents and queries get tokenized for the purpose of matching ā€¢Use stemming? Synonyms? Stop words? Ngrams?
  • 45. How Do We Ensure the Optimal Configuration? ā€¢Testing on a golden test set: ā€¢Run the top few hundred or thousand queries and measure IR metrics ā€¢Takes a long time to evaluate each configuration ā€“ minutes to hours ā€¢Common Approaches: ā€¢Manual Tuning? ā€¢Slow (human in the loop) and subjective ā€¢Grid Search ā€“ try every possible combination of settings? ā€¢NaĆÆve ā€“ searches the entire grid, does not adjust its search based on its findings ā€¢Slow test time limits how many configurations we can try ā€¢Can we do better?
  • 46. Can We Apply Machine Learning? ā€¢We have a labelled dataset of queries with relevancy judgements ā€¢Can we frame this as a supervised learning problem, and apply gradient descent? ā€¢Noā€¦ ā€¢Popular IR metrics (MAP, NDCG, etc) are non-differentiable ā€¢The set of query configuration and analysis settings cannot be optimized in this way as a gradient is unavailable
  • 47. Solution: Use a Black Box Optimization Algorithm ā€¢Use an optimization algorithm to optimize a ā€˜black boxā€™ function ā€¢Black box function ā€“ provide the optimization algorithm with a function that takes a set of parameters as inputs and computes a score ā€¢The black box algorithm will then try and choose parameter settings to optimize the score ā€¢This can be thought of as a form of reinforcement learning ā€¢These algorithms will intelligently search the space of possible search configurations to arrive at a solution
  • 48. Implementation ā€¢The black box function: ā€¢Inputs: ā€“ search engine configuration settings ā€¢Output: - a score denoting the performance of that search engine configuration on the golden test set ā€¢Example Optimization Algorithms: ā€¢Co-ordinate ascent ā€¢Genetic algorithm ā€¢Bayesian optimization ā€¢Simulated annealing
  • 49. Implementation ā€¢The problem ā€¢Optimize the configuration parameters of the search engine powering our content-based recommender engine ā€¢IR Metric to optimize ā€¢Mean Average Precision at 5 documents, averaged over all recommender queries ā€¢Train Test Split: ā€¢80% training data, 20% test data ā€¢Optimization Library : ā€¢GPyOpt ā€“ open source Bayesian optimization library from the university of Sheffield, UK ā€¢Result: ā€¢13% improvement in the MAP @ 5 on the test dataset
  • 50. Ideas for Future Work ā€¢Learn a better ranking function for the search engine: ā€¢Use genetic programming to evolve a ranking function that works better on our dataset ā€¢Use metric learning algorithms (e.g. maximum margin nearest neighbors)
  • 51. Part 3: Beware of Feedback Loops
  • 52. Building a Machine Learning System 1. Users interact with the system to produce data 2. Machine learning algorithms turn that data into a model Users Interact with the System Data Model Machine Learning Produce
  • 53. Building a Machine Learning System 1. Users interact with the system to produce data 2. Machine learning algorithms turn that data into a model What happens if the modelā€™s predictions influence the userā€™s behavior? Users Interact with the System Data Model Machine Learning Produce
  • 54. Feedback Loops 1. Users produce labelled data 2. Machine learning algorithms turn that data into a model 3. Model changes user behavior, modifying its own future training data Data Model Machine Learning Produce Model changes behavior Users Interact with the System
  • 55. Direct Feedback Loops ā€¢If the predictions of the machine learning system alter user behavior in such a way as to affect its own training data, then you have a feedback loop ā€¢Because the system is influencing its own training data, this can lead to gradual changes in the system behavior over time ā€¢This can lead to behavior that is hard to predict before the system is released into production ā€¢Examples ā€“ recommender systems, search engines that use machine learned ranking models, route finding systems, ad-targeting systems
  • 56. Hidden Feedback Loops ā€¢You can also get complex interactions when you have two separate machine learning models that influence each other through the real world ā€¢For example, improvements to a search engine might result in more purchases of certain products, which in turn could influence a recommender system that uses item popularity as one of the features used to generating recommendations ā€¢Feedback loops, and many other challenges involved in building and maintaining machine learning systems in the wild are covered in these two excellent papers from google: ā€¢Machine Learning: The High-Interest Credit Card of Technical Debt ā€¢Hidden Technical Debt in Machine Learning Systems
  • 57. Preventing Feedback Loops 1. Isolate a subset of data from being influenced by the model, for use in training the system ā€¢E.g. generate a subset of recommendations at random, or by using an unsupervised model ā€¢E.g. leave a small proportion of user searches un-ranked by the MLR model 2. Use a reinforcement learning model instead (such as a multi-armed bandit) - the system will dynamically adapt to the usersā€™ behavior, balancing exploring different hypotheses with exploiting what itā€™s learned to produce accurate predictions
  • 58. Summary ā€¢To improve relevancy in a search engine, it is important first to gather a golden dataset of relevancy judgements ā€¢Machine learned ranking is an excellent approach for improving relevancy. ā€¢However, it mainly focuses on improving precision not recall, and relies heavily on the quality of the the top search results ā€¢Conceptual search is one approach for improving the recall of the search engine ā€¢Black box optimization algorithms can be used to auto-tune the search engine configuration to improve relevancy ā€¢Search and recommendation engines that use supervised machine learning models are prone to feedback loops, which can lead to poor quality predictions
  • 59. END OF TALK ā€“ Questions?

Editor's Notes

  1. Show MJLT page (prepare it before talk)
  2. Show MJLT page (prepare it before talk)
  3. Idiom examples ā€“ bobā€™s your uncle, a bird in the hand, too many cooks spoil the broth, turn the other cheek, chip of the old block, etc
  4. Idiom examples ā€“ bobā€™s your uncle, a bird in the hand, too many cooks spoil the broth, turn the other cheek, chip of the old block, etc
  5. Picture taken from http://www.mlguru.cz/word2vec-jednoducha-aritmetika-se-slovy/, last accessed 9/28/2015
  6. Chart taken from http://google-opensource.blogspot.com/2013/08/learning-meaning-behind-words.html ā€“ last accessed 9/28/2015