SlideShare a Scribd company logo
Building a Real-time, Solr-powered
         Recommendation Engine

                  Trey Grainger
         Manager, Search Technology Development
                                @



Lucene Revolution 2012 - Boston
Overview
• Overview of Search & Matching Concepts
• Recommendation Approaches in Solr:
   • Attribute-based
   • Hierarchical Classification
   • Concept-based
   • More-like-this
   • Collaborative Filtering
   • Hybrid Approaches
• Important Considerations & Advanced Capabilities
  @ CareerBuilder
My Background
Trey Grainger
   • Manager, Search Technology Development
      @ CareerBuilder.com

Relevant Background
   • Search & Recommendations
   • High-volume, N-tier Architectures
   • NLP, Relevancy Tuning, user group testing, & machine learning

Fun Side Projects
   • Founder and Chief Engineer @               .com


   • Currently co-authoring Solr in Action book… keep your eyes out for
     the early access release from Manning Publications
About Search @CareerBuilder
• Over 1 million new jobs each month
• Over 45 million actively searchable resumes
• ~250 globally distributed search servers (in
  the U.S., Europe, & Asia)
• Thousands of unique, dynamically generated
  indexes
• Hundreds of millions of search documents
• Over 1 million searches an hour
Search Products @
Redefining “Search Engine”
• “Lucene is a high-performance, full-featured
  text search engine library…”
 Yes, but really…

• Lucene is a high-performance, fully-featured
  token matching and scoring library… which
  can perform full-text searching.
Redefining “Search Engine”

 or, in machine learning speak:

• A Lucene index is a multi-dimensional
  sparse matrix… with very fast and powerful
  lookup capabilities.

• Think of each field as a matrix containing each
  term mapped to each document
The Lucene Inverted Index
              (traditional text example)
                                            How the content is INDEXED into
What you SEND to Lucene/Solr:               Lucene/Solr (conceptually):

Document      Content Field                 Term            Documents
doc1          once upon a time, in a land   a               doc1 [2x]
              far, far away                 brown           doc3 [1x] , doc5 [1x]
doc2          the cow jumped over the       cat             doc4 [1x]
              moon.
                                            cow             doc2 [1x] , doc5 [1x]
doc3          the quick brown fox
              jumped over the lazy dog.     …               ...


doc4          the cat in the hat            once            doc1 [1x], doc5 [1x]

doc5          The brown cow said “moo”      over            doc2 [1x], doc3 [1x]
              once.                         the             doc2 [2x], doc3 [2x],
…             …                                             doc4[2x], doc5 [1x]
                                            …               …
Match Text Queries to Text Fields

       /solr/select/?q=jobcontent: (software engineer)

Job Content Field Documents            engineer
…               …                     doc5
engineer        doc1, doc3, doc4,
                doc5
                                    software engineer
…
                                      doc1 doc3
mechanical      doc2, doc4, doc6         doc4
…               …
software        doc1, doc3, doc4,
                doc7, doc8             software
…               …                      doc7   doc8
Beyond Text Searching
• Lucene/Solr is a text search matching engine

• When Lucene/Solr search text, they are matching
  tokens in the query with tokens in index

• Anything that can be searched upon can form the
  basis of matching and scoring:
  – text, attributes, locations, results of functions, user
    behavior, classifications, etc.
Business Case for Recommendations

• For companies like CareerBuilder, recommendations
  can provide as much or even greater business value
  (i.e. views, sales, job applications) than user-driven
  search capabilities.

• Recommendations create stickiness to pull users
  back to your company’s website, app, etc.

• What are recommendations?
    … searches of relevant content for a user
Approaches to Recommendations
• Content-based
   – Attribute based
       • i.e. income level, hobbies, location, experience
   – Hierarchical
       • i.e. “medical//nursing//oncology”, “animal//dog//terrier”
   – Textual Similarity
       • i.e. Solr’s MoreLikeThis Request Handler & Search Handler
   – Concept Based
       • i.e. Solr => “software engineer”, “java”, “search”, “open source”


• Behavioral Based
       • Collaborative Filtering: “Users who liked that also liked this…”

• Hybrid Approaches
Content-based Recommendation Approaches
Attribute-based Recommendations
• Example: Match User Attributes to Item Attribute Fields
   Janes_Profile:{
       Industry:”healthcare”,
       Locations:”Boston, MA”,
       JobTitle:”Nurse Educator”,
       Salary:{ min:40000, max:60000 },
   }


   /solr/select/?q=(jobtitle:”nurse educator”^25 OR
   jobtitle:(nurse educator)^10) AND ((city:”Boston” AND
   state:”MA”)^15 OR state:”MA”) AND
   _val_:”map(salary,40000,60000,10,0)”

   //by mapping the importance of each attribute to weights based upon
   your business domain, you can easily find results which match your
   customer’s profile without the user having to initiate a search.
Hierarchical Recommendations
• Example: Match User Attributes to Item Attribute Fields
   Janes_Profile:{
       MostLikelyCategory:”healthcare//nursing//oncology”,
       2ndMostLikelyCategory:”healthcare//nursing//transplant”,
       3rdMostLikelyCategory:”educator//postsecondary//nursing”, …
   }

   /solr/select/?q=(category:(
               (”healthcare.nursing.oncology”^40
               OR ”healthcare.nursing”^20
               OR “healthcare”^10))
                        OR
               (”healthcare.nursing.transplant”^20
               OR ”healthcare.nursing”^10
               OR “healthcare”^5))
                        OR
               (”educator.postsecondary.nursing”^10
               OR ”educator.postsecondary”^5
               OR “educator”)                       ))
Textual Similarity-based Recommendations
• Solr’s More Like This Request Handler / Search Handler are a good
  example of this.

• Essentially, “important keywords” are extracted from one or more
  documents and turned into a search.

• This results in secondary search results which demonstrate
  textual similarity to the original document(s)

• See http://wiki.apache.org/solr/MoreLikeThis for example usage

• Currently no distributed search support (but a patch is available)
Concept Based Recommendations
Approaches:
1) Create a Taxonomy/Dictionary to define your
    concepts and then either:
       a) manually tag documents as they come in
           //Very hard to scale… see Amazon Mechanical Turk if you must do this
  or
       b) create a classification system which automatically tags
          content as it comes in (supervised machine learning)
           //See Apache Mahout


2) Use an unsupervised machine learning algorithm to
   cluster documents and dynamically discover concepts
   (no dictionary required).
    //This is already built into Solr using Carrot2!
How Clustering Works
Setting Up Clustering in SolrConfig.xml
<searchComponent name="clustering" enable=“true“
class="solr.clustering.ClusteringComponent">
  <lst name="engine">
    <str name="name">default</str>
    <str name="carrot.algorithm">
         org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
    <str name="MultilingualClustering.defaultLanguage">ENGLISH</str>
  </lst>
</searchComponent>

<requestHandler name="/clustering" enable=“true" class="solr.SearchHandler">
  <lst name="defaults">
    <str name="clustering.engine">default</str>
    <bool name="clustering.results">true</bool>
    <str name="fl">*,score</str>
  </lst>
  <arr name="last-components">
    <str>clustering</str>
  </arr>
</requestHandler>
Clustering Search in Solr
• /solr/clustering/?q=content:nursing
    &rows=100
    &carrot.title=titlefield
    &carrot.snippet=titlefield
    &LingoClusteringAlgorithm.desiredClusterCountBase=25
    &group=false //clustering & grouping don’t currently play nicely

• Allows you to dynamically identify “concepts” and their
  prevalence within a user’s top search results
Search:   Nursing
Search:   .Net
Example Concept-based Recommendation
   Stage 1: Identify Concepts
 Original Query: q=(solr or lucene)                               Clusters Identifier:
                                                                  Developer (22)
  // can be a user’s search, their job title, a list of skills,   Java Developer (13)
 // or any other keyword rich data source
                                                                  Software (10)
                                                                  Senior Java Developer (9)
                                                                  Architect (6)
                                                                  Software Engineer (6)
                                                                  Web Developer (5)
                                                                  Search (3)
                                                                  Software Developer (3)
                                                                  Systems (3)
                                                                  Administrator (2)
Facets Identified (occupation):                                   Hadoop Engineer (2)
                                                                  Java J2EE (2)
Computer Software Engineers                                       Search Development (2)
Web Developers                                                    Software Architect (2)
...                                                               Solutions Architect (2)
Example Concept-based Recommendation
 Stage 2: Run Recommendations Search
q=content:(“Developer”^22 or “Java Developer”^13 or “Software
”^10 or “Senior Java Developer”^9 or “Architect ”^6 or “Software
Engineer”^6 or “Web Developer ”^5 or “Search”^3 or “Software
Developer”^3 or “Systems”^3 or “Administrator”^2 or “Hadoop
Engineer”^2 or “Java J2EE”^2 or “Search Development”^2 or
“Software Architect”^2 or “Solutions Architect”^2) and
occupation: (“Computer Software Engineers” or “Web
Developers”)

// Your can also add the user’s location or the original keywords to the
// recommendations search if it helps results quality for your use-case.
Example Concept-based Recommendation
Stage 3: Returning the Recommendations




                                         …
Important Side-bar: Geography
Geography and Recommendations
• Filtering or boosting results based upon geographical area or
  distance can help greatly for certain use cases:
   – Jobs/Resumes, Tickets/Concerts, Restaurants


• For other use cases, location sensitivity is nearly worthless:
   – Books, Songs, Movies

   /solr/select/?q=(Standard Recommendation Query) AND
   _val_:”(recip(geodist(location, 40.7142, 74.0064),1,1,0))”


   // there are dozens of well-documented ways to search/filter/sort/boost
   // on geography in Solr.. This is just one example.
Behavior-based Recommendation Approaches
          (Collaborative Filtering)
The Lucene Inverted Index
               (user behavior example)
                                       How the content is INDEXED into
What you SEND to Lucene/Solr:          Lucene/Solr (conceptually):

Document      “Users who bought this   Term            Documents
              product” Field
                                       user1           doc1, doc5
doc1          user1, user4, user5
                                       user2           doc2
doc2          user2, user3             user3           doc2
                                       user4           doc1, doc3,
doc3          user4                                    doc4, doc5
                                       user5           doc1, doc4
doc4          user4, user5
                                       …               …
doc5          user4, user1
…             …
Collaborative Filtering
• Step 1: Find similar users who like the same documents
                q=documentid: (“doc1” OR “doc4”)
 Document   “Users who bought this
            product “Field
                                         doc1                  doc4
 doc1       user1, user4, user5
                                      user1    user4         user4    user5
 doc2       user2, user3
                                           user5
 doc3       user4

 doc4       user4, user5             Top Scoring Results (Most Similar Users):
                                     1) user5 (2 shared likes)
 doc5       user4, user1             2) user4 (2 shared likes)
 …          …                        3) user 1 (1 shared like)
Collaborative Filtering
• Step 2: Search for docs “liked” by those similar users
Most Similar Users:
1) user5 (2 shared likes)
                            /solr/select/?q=userlikes: (“user5”^2
2) user4 (2 shared likes)                 OR “user4”^2 OR “user1”^1)
3) user 1 (1 shared like)


Term          Documents
                                        Top Recommended Documents:
user1         doc1, doc5                1) doc1 (matches user4, user5, user1)
user2         doc2                      2) doc4 (matches user4, user5)
                                        3) doc5 (matches user4, user1)
user3         doc2
                                        4) doc3 (matches user4)
user4         doc1, doc3,
              doc4, doc5                //Doc 2 does not match
user5         doc1, doc4                //above example ignores idf calculations
…             …
Lot’s of Variations
•   Users –> Item(s)
•   User –> Item(s) –> Users
•   Item –> Users –> Item(s)
•   etc.
                     User 1   User 2   User 3   User 4   …
            Item 1   X        X        X                 …
            Item 2            X                 X        …
            Item 3            X        X                 …
            Item 4                              X        …
            …        …        …        …        …        …

Note: Just because this example tags with “users” doesn’t mean you have to.
You can map any entity to any other related entity and achieve a similar result.
Comparison with Mahout
•   Recommendations are much easier for us to perform in Solr:
     –   Data is already present and up-to-date
     –   Doesn’t require writing significant code to make changes (just changing queries)
     –   Recommendations are real-time as opposed to asynchronously processed off-line.
     –   Allows easy utilization of any content and available functions to boost results

•   Our initial tests show our collaborative filtering approach in Solr significantly
    outperforms our Mahout tests in terms of results quality
     – Note: We believe that some portion of the quality issues we have with the Mahout
       implementation have to do with staleness of data due to the frequency with which our data is
       updated.

•   Our general take away:
     –   We believe that Mahout might be able to return better matches than Solr with a lot of custom
         work, but it does not perform better for us out of the box.

•   Because we already scale…
     – Since we already have all of data indexed in Solr (tens to hundreds of millions of documents),
       there’s no need for us to rebuild a sparse matrix in Hadoop (your needs may be different).
Hybrid Recommendation Approaches
Hybrid Approaches
• Not much to say here, I think you get the point.

• /solr/select/?q=category:(”healthcare.nursing.oncology”^10
  ”healthcare.nursing”^5 OR “healthcare”) OR title:”Nurse
  Educator”^15 AND _val_:”map(salary,40000,60000,10,0)”^5
  AND _val_:”(recip(geodist(location, 40.7142,
  74.0064),1,1,0))”)

• Combining multiple approaches generally yields better overall
  results if done intelligently. Experimentation is key here.
Important Considerations &
Advanced Capabilities @ CareerBuilder
Important Considerations @
            CareerBuilder

• Payload Scoring
• Measuring Results Quality
• Understanding our Users
Custom Scoring with Payloads
•   In addition to boosting search terms and fields, content within the same field can also
    be boosted differently using Payloads (requires a custom scoring implementation):

•   Content Field:
         design [1] / engineer [1] / really [ ] / great [ ] / job [ ] / ten[3] / years[3] /
         experience[3] / careerbuilder [2] / design *2+, …

     Payload Bucket Mappings:
     jobtitle: bucket=[1] boost=10; company: bucket=[2] boost=4;
        jobdescription: bucket=[] weight=1; experience: bucket=[3] weight=1.5

     We can pass in a parameter to solr at query time specifying the boost to apply to each
     bucket i.e. …&bucketWeights=1:10;2:4;3:1.5;default:1;

•   This allows us to map many relevancy buckets to search terms at index time and adjust
    the weighting at query time without having to search across hundreds of fields.

•   By making all scoring parameters overridable at query time, we are able to do A / B
    testing to consistently improve our relevancy model
Measuring Results Quality
• A/B Testing is key to understanding our search results quality.

• Users are randomly divided between equal groups

• Each group experiences a different algorithm for the duration of the
  test

• We can measure “performance” of the algorithm based upon
  changes in user behavior:
    – For us, more job applications = more relevant results
    – For other companies, that might translate into products purchased, additional
      friends requested, or non-search pages viewed

• We use this to test both keyword search results and also
  recommendations quality
Understanding our Users
(given limited information)
Understanding Our Users
• Machine learning algorithms can help us understand what
  matters most to different groups of users.

         Example: Willingness to relocate for a job (miles per percentile)
 2,500

 2,000
            Title Examiners, Abstractors, and Searchers
 1,500

 1,000
            Software Developers, Systems Software
  500
            Food Preparation Workers
    0
          1% 5% 10% 20% 25% 30% 40% 50% 60% 70% 75% 80% 90% 95%
Key Takeaways
• Recommendations can be as valuable or more
  than keyword search.

• If your data fits in Solr then you have everything
  you need to build an industry-leading
  recommendation system

• Even a single keyword can be enough to begin
  making meaningful recommendations. Build up
  intelligently from there.
Contact Info
   Trey Grainger
                trey.grainger@careerbuilder.com
                http://www.careerbuilder.com
                @treygrainger




And yes, we are hiring – come chat with me if you are interested.

More Related Content

What's hot

Introduction à ElasticSearch
Introduction à ElasticSearchIntroduction à ElasticSearch
Introduction à ElasticSearch
Fadel Chafai
 
Solrで多様なランキングモデルを活用するためのプラグイン開発 #SolrJP
Solrで多様なランキングモデルを活用するためのプラグイン開発 #SolrJPSolrで多様なランキングモデルを活用するためのプラグイン開発 #SolrJP
Solrで多様なランキングモデルを活用するためのプラグイン開発 #SolrJP
Yahoo!デベロッパーネットワーク
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
lucenerevolution
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Distributed tracing using open tracing &amp; jaeger 2
Distributed tracing using open tracing &amp; jaeger 2Distributed tracing using open tracing &amp; jaeger 2
Distributed tracing using open tracing &amp; jaeger 2
Chandresh Pancholi
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
Sease
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
confluent
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
Saumitra Srivastav
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
searchbox-com
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
lucenerevolution
 
Introduction to Distributed Tracing
Introduction to Distributed TracingIntroduction to Distributed Tracing
Introduction to Distributed Tracing
petabridge
 
NiFi 시작하기
NiFi 시작하기NiFi 시작하기
NiFi 시작하기
Byunghwa Yoon
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
Junyi Song
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
Knoldus Inc.
 
Apache Storm - Introduction au traitement temps-réel avec Storm
Apache Storm - Introduction au traitement temps-réel avec StormApache Storm - Introduction au traitement temps-réel avec Storm
Apache Storm - Introduction au traitement temps-réel avec Storm
Paris_Storm_UG
 
社内Java8勉強会 ラムダ式とストリームAPI
社内Java8勉強会 ラムダ式とストリームAPI社内Java8勉強会 ラムダ式とストリームAPI
社内Java8勉強会 ラムダ式とストリームAPI
Akihiro Ikezoe
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 

What's hot (20)

Introduction à ElasticSearch
Introduction à ElasticSearchIntroduction à ElasticSearch
Introduction à ElasticSearch
 
Solrで多様なランキングモデルを活用するためのプラグイン開発 #SolrJP
Solrで多様なランキングモデルを活用するためのプラグイン開発 #SolrJPSolrで多様なランキングモデルを活用するためのプラグイン開発 #SolrJP
Solrで多様なランキングモデルを活用するためのプラグイン開発 #SolrJP
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Distributed tracing using open tracing &amp; jaeger 2
Distributed tracing using open tracing &amp; jaeger 2Distributed tracing using open tracing &amp; jaeger 2
Distributed tracing using open tracing &amp; jaeger 2
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 
Introduction to Distributed Tracing
Introduction to Distributed TracingIntroduction to Distributed Tracing
Introduction to Distributed Tracing
 
NiFi 시작하기
NiFi 시작하기NiFi 시작하기
NiFi 시작하기
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
 
Apache Storm - Introduction au traitement temps-réel avec Storm
Apache Storm - Introduction au traitement temps-réel avec StormApache Storm - Introduction au traitement temps-réel avec Storm
Apache Storm - Introduction au traitement temps-réel avec Storm
 
社内Java8勉強会 ラムダ式とストリームAPI
社内Java8勉強会 ラムダ式とストリームAPI社内Java8勉強会 ラムダ式とストリームAPI
社内Java8勉強会 ラムダ式とストリームAPI
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 

Viewers also liked

Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Lucidworks
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Lucidworks
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
Robert Douglass
 
Netflix Global Search - Lucene Revolution
Netflix Global Search - Lucene RevolutionNetflix Global Search - Lucene Revolution
Netflix Global Search - Lucene Revolution
ivan provalov
 
Collaborative filtering for recommendation systems in Python, Nicolas Hug
Collaborative filtering for recommendation systems in Python, Nicolas HugCollaborative filtering for recommendation systems in Python, Nicolas Hug
Collaborative filtering for recommendation systems in Python, Nicolas Hug
Pôle Systematic Paris-Region
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
NYC Predictive Analytics
 

Viewers also liked (6)

Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Netflix Global Search - Lucene Revolution
Netflix Global Search - Lucene RevolutionNetflix Global Search - Lucene Revolution
Netflix Global Search - Lucene Revolution
 
Collaborative filtering for recommendation systems in Python, Nicolas Hug
Collaborative filtering for recommendation systems in Python, Nicolas HugCollaborative filtering for recommendation systems in Python, Nicolas Hug
Collaborative filtering for recommendation systems in Python, Nicolas Hug
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 

Similar to Building a real time, solr-powered recommendation engine

Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
lucenerevolution
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
From keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic searchFrom keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic search
CareerBuilder.com
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
Trey Grainger
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Trey Grainger
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
Trey Grainger
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
Trey Grainger
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
Trey Grainger
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
Trey Grainger
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
CareerBuilder.com
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Sujit Pal
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
Trey Grainger
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Databricks
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
Asad Abbas
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
Jay Bharat
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
Trey Grainger
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital.AI
 

Similar to Building a real time, solr-powered recommendation engine (20)

Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
From keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic searchFrom keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic search
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 

More from Trey Grainger

Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
Trey Grainger
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital Transformation
Trey Grainger
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
Trey Grainger
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Trey Grainger
 
The Next Generation of AI-powered Search
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered Search
Trey Grainger
 
Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)
Trey Grainger
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
Trey Grainger
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative Space
Trey Grainger
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)
Trey Grainger
 
The Future of Search and AI
The Future of Search and AIThe Future of Search and AI
The Future of Search and AI
Trey Grainger
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for Meaning
Trey Grainger
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
Trey Grainger
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge Graph
Trey Grainger
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Trey Grainger
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
Trey Grainger
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Trey Grainger
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
Trey Grainger
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
Trey Grainger
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
Trey Grainger
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
Trey Grainger
 

More from Trey Grainger (20)

Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital Transformation
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)
 
The Next Generation of AI-powered Search
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered Search
 
Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative Space
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)
 
The Future of Search and AI
The Future of Search and AIThe Future of Search and AI
The Future of Search and AI
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for Meaning
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge Graph
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 

Recently uploaded

Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
313mohammedarshad
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
HackersList
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
shyamraj55
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
Priyanka Aash
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
LINUS PROJECTS (INDIA)
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
Anant Gupta
 
The importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT StandardizationThe importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT Standardization
Axel Rennoch
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 

Recently uploaded (20)

Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
 
The importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT StandardizationThe importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT Standardization
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 

Building a real time, solr-powered recommendation engine

  • 1. Building a Real-time, Solr-powered Recommendation Engine Trey Grainger Manager, Search Technology Development @ Lucene Revolution 2012 - Boston
  • 2. Overview • Overview of Search & Matching Concepts • Recommendation Approaches in Solr: • Attribute-based • Hierarchical Classification • Concept-based • More-like-this • Collaborative Filtering • Hybrid Approaches • Important Considerations & Advanced Capabilities @ CareerBuilder
  • 3. My Background Trey Grainger • Manager, Search Technology Development @ CareerBuilder.com Relevant Background • Search & Recommendations • High-volume, N-tier Architectures • NLP, Relevancy Tuning, user group testing, & machine learning Fun Side Projects • Founder and Chief Engineer @ .com • Currently co-authoring Solr in Action book… keep your eyes out for the early access release from Manning Publications
  • 4. About Search @CareerBuilder • Over 1 million new jobs each month • Over 45 million actively searchable resumes • ~250 globally distributed search servers (in the U.S., Europe, & Asia) • Thousands of unique, dynamically generated indexes • Hundreds of millions of search documents • Over 1 million searches an hour
  • 6. Redefining “Search Engine” • “Lucene is a high-performance, full-featured text search engine library…” Yes, but really… • Lucene is a high-performance, fully-featured token matching and scoring library… which can perform full-text searching.
  • 7. Redefining “Search Engine” or, in machine learning speak: • A Lucene index is a multi-dimensional sparse matrix… with very fast and powerful lookup capabilities. • Think of each field as a matrix containing each term mapped to each document
  • 8. The Lucene Inverted Index (traditional text example) How the content is INDEXED into What you SEND to Lucene/Solr: Lucene/Solr (conceptually): Document Content Field Term Documents doc1 once upon a time, in a land a doc1 [2x] far, far away brown doc3 [1x] , doc5 [1x] doc2 the cow jumped over the cat doc4 [1x] moon. cow doc2 [1x] , doc5 [1x] doc3 the quick brown fox jumped over the lazy dog. … ... doc4 the cat in the hat once doc1 [1x], doc5 [1x] doc5 The brown cow said “moo” over doc2 [1x], doc3 [1x] once. the doc2 [2x], doc3 [2x], … … doc4[2x], doc5 [1x] … …
  • 9. Match Text Queries to Text Fields /solr/select/?q=jobcontent: (software engineer) Job Content Field Documents engineer … … doc5 engineer doc1, doc3, doc4, doc5 software engineer … doc1 doc3 mechanical doc2, doc4, doc6 doc4 … … software doc1, doc3, doc4, doc7, doc8 software … … doc7 doc8
  • 10. Beyond Text Searching • Lucene/Solr is a text search matching engine • When Lucene/Solr search text, they are matching tokens in the query with tokens in index • Anything that can be searched upon can form the basis of matching and scoring: – text, attributes, locations, results of functions, user behavior, classifications, etc.
  • 11. Business Case for Recommendations • For companies like CareerBuilder, recommendations can provide as much or even greater business value (i.e. views, sales, job applications) than user-driven search capabilities. • Recommendations create stickiness to pull users back to your company’s website, app, etc. • What are recommendations? … searches of relevant content for a user
  • 12. Approaches to Recommendations • Content-based – Attribute based • i.e. income level, hobbies, location, experience – Hierarchical • i.e. “medical//nursing//oncology”, “animal//dog//terrier” – Textual Similarity • i.e. Solr’s MoreLikeThis Request Handler & Search Handler – Concept Based • i.e. Solr => “software engineer”, “java”, “search”, “open source” • Behavioral Based • Collaborative Filtering: “Users who liked that also liked this…” • Hybrid Approaches
  • 14. Attribute-based Recommendations • Example: Match User Attributes to Item Attribute Fields Janes_Profile:{ Industry:”healthcare”, Locations:”Boston, MA”, JobTitle:”Nurse Educator”, Salary:{ min:40000, max:60000 }, } /solr/select/?q=(jobtitle:”nurse educator”^25 OR jobtitle:(nurse educator)^10) AND ((city:”Boston” AND state:”MA”)^15 OR state:”MA”) AND _val_:”map(salary,40000,60000,10,0)” //by mapping the importance of each attribute to weights based upon your business domain, you can easily find results which match your customer’s profile without the user having to initiate a search.
  • 15. Hierarchical Recommendations • Example: Match User Attributes to Item Attribute Fields Janes_Profile:{ MostLikelyCategory:”healthcare//nursing//oncology”, 2ndMostLikelyCategory:”healthcare//nursing//transplant”, 3rdMostLikelyCategory:”educator//postsecondary//nursing”, … } /solr/select/?q=(category:( (”healthcare.nursing.oncology”^40 OR ”healthcare.nursing”^20 OR “healthcare”^10)) OR (”healthcare.nursing.transplant”^20 OR ”healthcare.nursing”^10 OR “healthcare”^5)) OR (”educator.postsecondary.nursing”^10 OR ”educator.postsecondary”^5 OR “educator”) ))
  • 16. Textual Similarity-based Recommendations • Solr’s More Like This Request Handler / Search Handler are a good example of this. • Essentially, “important keywords” are extracted from one or more documents and turned into a search. • This results in secondary search results which demonstrate textual similarity to the original document(s) • See http://wiki.apache.org/solr/MoreLikeThis for example usage • Currently no distributed search support (but a patch is available)
  • 17. Concept Based Recommendations Approaches: 1) Create a Taxonomy/Dictionary to define your concepts and then either: a) manually tag documents as they come in //Very hard to scale… see Amazon Mechanical Turk if you must do this or b) create a classification system which automatically tags content as it comes in (supervised machine learning) //See Apache Mahout 2) Use an unsupervised machine learning algorithm to cluster documents and dynamically discover concepts (no dictionary required). //This is already built into Solr using Carrot2!
  • 19. Setting Up Clustering in SolrConfig.xml <searchComponent name="clustering" enable=“true“ class="solr.clustering.ClusteringComponent"> <lst name="engine"> <str name="name">default</str> <str name="carrot.algorithm"> org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str> <str name="MultilingualClustering.defaultLanguage">ENGLISH</str> </lst> </searchComponent> <requestHandler name="/clustering" enable=“true" class="solr.SearchHandler"> <lst name="defaults"> <str name="clustering.engine">default</str> <bool name="clustering.results">true</bool> <str name="fl">*,score</str> </lst> <arr name="last-components"> <str>clustering</str> </arr> </requestHandler>
  • 20. Clustering Search in Solr • /solr/clustering/?q=content:nursing &rows=100 &carrot.title=titlefield &carrot.snippet=titlefield &LingoClusteringAlgorithm.desiredClusterCountBase=25 &group=false //clustering & grouping don’t currently play nicely • Allows you to dynamically identify “concepts” and their prevalence within a user’s top search results
  • 21. Search: Nursing
  • 22. Search: .Net
  • 23. Example Concept-based Recommendation Stage 1: Identify Concepts Original Query: q=(solr or lucene) Clusters Identifier: Developer (22) // can be a user’s search, their job title, a list of skills, Java Developer (13) // or any other keyword rich data source Software (10) Senior Java Developer (9) Architect (6) Software Engineer (6) Web Developer (5) Search (3) Software Developer (3) Systems (3) Administrator (2) Facets Identified (occupation): Hadoop Engineer (2) Java J2EE (2) Computer Software Engineers Search Development (2) Web Developers Software Architect (2) ... Solutions Architect (2)
  • 24. Example Concept-based Recommendation Stage 2: Run Recommendations Search q=content:(“Developer”^22 or “Java Developer”^13 or “Software ”^10 or “Senior Java Developer”^9 or “Architect ”^6 or “Software Engineer”^6 or “Web Developer ”^5 or “Search”^3 or “Software Developer”^3 or “Systems”^3 or “Administrator”^2 or “Hadoop Engineer”^2 or “Java J2EE”^2 or “Search Development”^2 or “Software Architect”^2 or “Solutions Architect”^2) and occupation: (“Computer Software Engineers” or “Web Developers”) // Your can also add the user’s location or the original keywords to the // recommendations search if it helps results quality for your use-case.
  • 25. Example Concept-based Recommendation Stage 3: Returning the Recommendations …
  • 27. Geography and Recommendations • Filtering or boosting results based upon geographical area or distance can help greatly for certain use cases: – Jobs/Resumes, Tickets/Concerts, Restaurants • For other use cases, location sensitivity is nearly worthless: – Books, Songs, Movies /solr/select/?q=(Standard Recommendation Query) AND _val_:”(recip(geodist(location, 40.7142, 74.0064),1,1,0))” // there are dozens of well-documented ways to search/filter/sort/boost // on geography in Solr.. This is just one example.
  • 28. Behavior-based Recommendation Approaches (Collaborative Filtering)
  • 29. The Lucene Inverted Index (user behavior example) How the content is INDEXED into What you SEND to Lucene/Solr: Lucene/Solr (conceptually): Document “Users who bought this Term Documents product” Field user1 doc1, doc5 doc1 user1, user4, user5 user2 doc2 doc2 user2, user3 user3 doc2 user4 doc1, doc3, doc3 user4 doc4, doc5 user5 doc1, doc4 doc4 user4, user5 … … doc5 user4, user1 … …
  • 30. Collaborative Filtering • Step 1: Find similar users who like the same documents q=documentid: (“doc1” OR “doc4”) Document “Users who bought this product “Field doc1 doc4 doc1 user1, user4, user5 user1 user4 user4 user5 doc2 user2, user3 user5 doc3 user4 doc4 user4, user5 Top Scoring Results (Most Similar Users): 1) user5 (2 shared likes) doc5 user4, user1 2) user4 (2 shared likes) … … 3) user 1 (1 shared like)
  • 31. Collaborative Filtering • Step 2: Search for docs “liked” by those similar users Most Similar Users: 1) user5 (2 shared likes) /solr/select/?q=userlikes: (“user5”^2 2) user4 (2 shared likes) OR “user4”^2 OR “user1”^1) 3) user 1 (1 shared like) Term Documents Top Recommended Documents: user1 doc1, doc5 1) doc1 (matches user4, user5, user1) user2 doc2 2) doc4 (matches user4, user5) 3) doc5 (matches user4, user1) user3 doc2 4) doc3 (matches user4) user4 doc1, doc3, doc4, doc5 //Doc 2 does not match user5 doc1, doc4 //above example ignores idf calculations … …
  • 32. Lot’s of Variations • Users –> Item(s) • User –> Item(s) –> Users • Item –> Users –> Item(s) • etc. User 1 User 2 User 3 User 4 … Item 1 X X X … Item 2 X X … Item 3 X X … Item 4 X … … … … … … … Note: Just because this example tags with “users” doesn’t mean you have to. You can map any entity to any other related entity and achieve a similar result.
  • 33. Comparison with Mahout • Recommendations are much easier for us to perform in Solr: – Data is already present and up-to-date – Doesn’t require writing significant code to make changes (just changing queries) – Recommendations are real-time as opposed to asynchronously processed off-line. – Allows easy utilization of any content and available functions to boost results • Our initial tests show our collaborative filtering approach in Solr significantly outperforms our Mahout tests in terms of results quality – Note: We believe that some portion of the quality issues we have with the Mahout implementation have to do with staleness of data due to the frequency with which our data is updated. • Our general take away: – We believe that Mahout might be able to return better matches than Solr with a lot of custom work, but it does not perform better for us out of the box. • Because we already scale… – Since we already have all of data indexed in Solr (tens to hundreds of millions of documents), there’s no need for us to rebuild a sparse matrix in Hadoop (your needs may be different).
  • 35. Hybrid Approaches • Not much to say here, I think you get the point. • /solr/select/?q=category:(”healthcare.nursing.oncology”^10 ”healthcare.nursing”^5 OR “healthcare”) OR title:”Nurse Educator”^15 AND _val_:”map(salary,40000,60000,10,0)”^5 AND _val_:”(recip(geodist(location, 40.7142, 74.0064),1,1,0))”) • Combining multiple approaches generally yields better overall results if done intelligently. Experimentation is key here.
  • 36. Important Considerations & Advanced Capabilities @ CareerBuilder
  • 37. Important Considerations @ CareerBuilder • Payload Scoring • Measuring Results Quality • Understanding our Users
  • 38. Custom Scoring with Payloads • In addition to boosting search terms and fields, content within the same field can also be boosted differently using Payloads (requires a custom scoring implementation): • Content Field: design [1] / engineer [1] / really [ ] / great [ ] / job [ ] / ten[3] / years[3] / experience[3] / careerbuilder [2] / design *2+, … Payload Bucket Mappings: jobtitle: bucket=[1] boost=10; company: bucket=[2] boost=4; jobdescription: bucket=[] weight=1; experience: bucket=[3] weight=1.5 We can pass in a parameter to solr at query time specifying the boost to apply to each bucket i.e. …&bucketWeights=1:10;2:4;3:1.5;default:1; • This allows us to map many relevancy buckets to search terms at index time and adjust the weighting at query time without having to search across hundreds of fields. • By making all scoring parameters overridable at query time, we are able to do A / B testing to consistently improve our relevancy model
  • 39. Measuring Results Quality • A/B Testing is key to understanding our search results quality. • Users are randomly divided between equal groups • Each group experiences a different algorithm for the duration of the test • We can measure “performance” of the algorithm based upon changes in user behavior: – For us, more job applications = more relevant results – For other companies, that might translate into products purchased, additional friends requested, or non-search pages viewed • We use this to test both keyword search results and also recommendations quality
  • 40. Understanding our Users (given limited information)
  • 41. Understanding Our Users • Machine learning algorithms can help us understand what matters most to different groups of users. Example: Willingness to relocate for a job (miles per percentile) 2,500 2,000 Title Examiners, Abstractors, and Searchers 1,500 1,000 Software Developers, Systems Software 500 Food Preparation Workers 0 1% 5% 10% 20% 25% 30% 40% 50% 60% 70% 75% 80% 90% 95%
  • 42. Key Takeaways • Recommendations can be as valuable or more than keyword search. • If your data fits in Solr then you have everything you need to build an industry-leading recommendation system • Even a single keyword can be enough to begin making meaningful recommendations. Build up intelligently from there.
  • 43. Contact Info  Trey Grainger trey.grainger@careerbuilder.com http://www.careerbuilder.com @treygrainger And yes, we are hiring – come chat with me if you are interested.