SlideShare a Scribd company logo
1 of 52
Search &
Recommendations:
3 Sides of the Same Coin
Nick Pentreath
Principal Engineer
@MLnick
DBG / June 11, 2018 / © 2018 IBM Corporation
About
@MLnick on Twitter & Github
Principal Engineer, IBM
CODAIT - Center for Open-Source Data &
AI Technologies
Machine Learning & AI
Apache Spark committer & PMC
Author of Machine Learning with Spark
Various conferences & meetups
DBG / June 11, 2018 / © 2018 IBM Corporation
Center for Open Source Data
and AI Technologies
CODAIT
codait.org
DBG / June 11, 2018 / © 2018 IBM Corporation
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
Improving Enterprise AI Lifecycle in Open Source
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
Agenda
Recommender systems overview
Search & Recommendations
3 Sides of the Coin
Performance
Summary
DBG / June 11, 2018 / © 2018 IBM Corporation
Recommender Systems
DBG / June 11, 2018 / © 2018 IBM Corporation
Users and Items
Recommender Systems
DBG / June 11, 2018 / © 2018 IBM Corporation
System Requirements
Recommender Systems
DBG / June 11, 2018 / © 2018 IBM Corporation
Filtering &
Grouping
Business
Rules
Events
DBG / June 11, 2018 / © 2018 IBM Corporation
Recommender Systems
– Implicit preference data
• Online – page view, click
• Commerce – add-to-cart, purchase, return
– Explicit preference data
• Ratings, reviews
– Intent
• Search query
– Social
• Like, share, follow, unfollow, block
Context
Recommender Systems
DBG / June 11, 2018 / © 2018 IBM Corporation
How to handle implicit feedback?
Recommender Systems
DBG / June 11, 2018 / © 2018 IBM Corporation
Cold Start
DBG / June 11, 2018 / © 2018 IBM Corporation
Recommender Systems
New items
– No historical interaction data
– Typically use baselines or item content
New (or unknown) users
– Previously unseen or anonymous users have no user
profile or historical interactions
– Have context data (but possibly very limited)
– Cannot directly use collaborative filtering models
• Item-similarity for current item
• Represent user as aggregation of items
• Contextual models can incorporate short-term history
Prediction
DBG / June 11, 2018 / © 2018 IBM Corporation
Recommender Systems
Recommendation is ranking
– Given a user and context, rank the available items in order
of likelihood that the user will interact with them
Sort
items
Serving Requirements
DBG / June 11, 2018 / © 2018 IBM Corporation
Recommender Systems
– Serving => ranking large # items
– Often need to filter
• categories, popularity, price, geo, time
– Use all data at prediction time
• content, context, preference, session
– Need to scale with item set and feature set
– Handle cold start
• content-based models or fallbacks, item-aggregates
– Easily incorporate new preference data
• model re-training, fold in
Search &
Recommendations
DBG / June 11, 2018 / © 2018 IBM Corporation
Prelude: Ratings Matrix
DBG / June 11, 2018 / © 2018 IBM Corporation
Search & Recommendation
Prelude: Item-item Co-occurrence
DBG / June 11, 2018 / © 2018 IBM Corporation
Search & Recommendation
One of the earliest approaches
– Compute item co-occurrence matrix from ratings
matrix, i.e. X t X
– Scoring can be online or pre-computed
– Similarity metric between items
Prelude: Matrix Factorization
DBG / June 11, 2018 / © 2018 IBM Corporation
Search & Recommendation
One of the de facto standard models
– Find two smaller matrices (called the factor
matrices) that approximate the ratings matrix
– Minimize the reconstruction error (i.e. rating
prediction / completion)
– Efficient, scalable algorithms
• Gradient Descent; Alternating Least Squares (ALS)
– Prediction is simple
– Can handle implicit data through weighting
Recommendation engines
DBG / June 11, 2018 / © 2018 IBM Corporation
Search & Recommendation
Recommendation is ranking
– Given a user and context, rank the available items in order
of likelihood that the user will interact with them
Sort
items
Compute
scores
Search engines
DBG / June 11, 2018 / © 2018 IBM Corporation
Search & Recommendation
Search is ranking
– Given a query, rank the available items in order of
similarity of item to query
Sort
items
“cat videos”
Compute
similarity
Meeting our Requirements?
DBG / June 11, 2018 / © 2018 IBM Corporation
Search & Recommendation
– Serving => ranking large # items
– Often need to filter
• categories, popularity, price, geo, time
– Use all data at prediction time
• content, context, preference, session
– Need to scale with item set and feature set
– Handle cold start
• content-based models or fallbacks, item-aggregates
– Easily incorporate new preference data
• model re-training, fold in
Inverted
indexing
Model
dependent
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Broad approaches
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Score-then-
search
Native search
Custom
ranking
Score-then-search
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Score-then-
search
Scoring
system
Score
Search
engine
Search
engine
ResultsFilter
Filter
Scoring
system
ResultsScore
Score-then-search
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Score-then-
search
• Complete flexibility in
model
• Easier to incorporate richer
content features
• Can optimize scoring
component
• Maintain (at least) 2
systems
• Filtering challenges
• Round trips between
systems
Native Search
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Native search
Pre-
compute
Index
Search
engine
ResultsSearch
Native Search
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Native search
Pre-compute Index
Native Search
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Native search
Indexed items Search query
Native Search
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Native search
• No changes to search
engine required
• Very fast & flexible at
query time
• Scalable - pre-compute +
thresholding
• Can use (almost) all data
• Must decide what to pre-
compute
• No ordering retained in
indexed terms
• More difficult to include
richer content data (image,
audio)
Native Search
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Native search
Universal recommender
The Mahout Correlated Cross-Occurrence
Algorithm
Custom Ranking
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Custom
ranking
Scoring
system
Score & filter
Search
engine
Results
One system
Search Ranking
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Can we use the same machinery?
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Delimited Payload Filter
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Custom Scoring Function
DBG / June 11, 2018 / © 2018 IBM Corporation
3 Sides of the Coin
– Native script (Java), compiled for speed
– Scoring function computes dot product by:
• For each document vector index (“term”), retrieve
payload
• Score += payload * query(i)
– Normalizes with query vector norm and document
vector norm for cosine similarity
Can we use the same machinery?
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Get search for free!
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Custom Ranking
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Custom
ranking
• Combine search & scoring
into one system
• Potential to incorporate
richer content features &
contextual models
• Combine best of both
worlds between pre-
computing & online scoring
(sort of)
• Requires custom plugin for
your search engine
• Scaling limits &
performance
considerations
• Must decide how to blend
interactions (e.g. weights)
Custom Ranking
3 Sides of the Coin
DBG / June 11, 2018 / © 2018 IBM Corporation
Custom
ranking
Spark & Elasticsearch Recommender on
IBM Code
Elasticsearch vector scoring plugin
Performance
DBG / June 11, 2018 / © 2018 IBM Corporation
Custom Scoring Performance
DBG / June 11, 2018 / © 2018 IBM Corporation
Performance
*3x nodes, 30x shards
Custom Scoring Performance
DBG / June 11, 2018 / © 2018 IBM Corporation
Performance
*3x nodes, k=50
Comparison to Score-then-search
DBG / June 11, 2018 / © 2018 IBM Corporation
Performance
*3x nodes, 30x shards, k=50, 1,000,000 items
Scaling custom scoring - LSH
DBG / June 11, 2018 / © 2018 IBM Corporation
Performance
*3x nodes, 30x shards, k=50, 1,000,000 items
Pure Search
Approaches
DBG / June 11, 2018 / © 2018 IBM Corporation
“Pure” search engine approaches
DBG / June 11, 2018 / © 2018 IBM Corporation
Pure Search
– More like this
• Content similarity
– Significant terms queries
• 2 stage query:
– 1st query interactions to get set of items for a user
– 2nd significant terms aggregation with the item set
as background
• Result is very similar to co-occurrence
approaches
• Round trips may be slow
Conclusion
DBG / June 11, 2018 / © 2018 IBM Corporation
Custom Ranking – Future Directions
DBG / June 11, 2018 / © 2018 IBM Corporation
Conclusion
– Apache Solr version
• https://github.com/saaay71/solr-vector-scoring
– Improve performance
• Improved scoring performance for vector scoring
plugin: https://github.com/lior-k/fast-elasticsearch-
vector-scoring
• Investigate performance of LSH-filtered scoring
• Dig deeper into Lucene internals to combine matrix-
vector math with search & filter?
– Investigate more complex models
Summary
DBG / June 11, 2018 / © 2018 IBM Corporation
Conclusion
All approaches have different tradeoffs
Score-then-
search
Native search
Custom
ranking
Max model flexibility Max search integrationBalanced
Thank you
codait.org
twitter.com/MLnick
github.com/MLnick
developer.ibm.com/code
DBG / June 11, 2018 / © 2018 IBM Corporation
FfDL
Sign up for IBM Cloud and try Watson Studio!
https://ibm.biz/BdZ6qf
IBM Code Pattern
https://datascience.ibm.com/
MAX
Call for Code inspires developers
to solve pressing global problems
with sustainable software
delivering
on their vast potential to do good.
Bringing together NGOs,
institutions, enterprises, and
startup developers to compete
build effective disaster mitigation
solutions, with a focus on health
and well-being.
International Federation of Red
Cross/Red Crescent, The American
American Red Cross, and the
United Nations Office of Human
Rights combine for the Call for
Code Award to elevate the profile
of developers.
Award winners will receive long-term
support through open source
foundations, financial prizes, the
opportunity to present their solution
to leading VCs, and will deploy their
solution through IBM’s Corporate
Service Corps.
Developers will jump-start their
project with dedicated IBM Code
Patterns, combined with optional
enterprise technology to build
projects over the course of three
months.
Judged by the world’s most renowned
technologists, the grand prize will be
presented in October at an Award
Event.
developer.ibm.com/callforcode
Links & References
Spark & Elasticsearch Recommender on IBM Code
Elasticsearch vector scoring plugin
Solr vector scoring plugin
Improved performance Elasticsearch plugin
Elasticsearch More Like This Query
Elasticsearch significant terms aggregation
Universal recommender
The Mahout Correlated Cross-Occurrence Algorithm
DBG / June 11, 2018 / © 2018 IBM Corporation
DBG / June 11, 2018 / © 2018 IBM Corporation

More Related Content

What's hot

AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...Cambridge Semantics
 
Google search vs Solr search for Enterprise search
Google search vs Solr search for Enterprise searchGoogle search vs Solr search for Enterprise search
Google search vs Solr search for Enterprise searchVeera Shekar
 
Integrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentIntegrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentCloudera, Inc.
 
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020Cambridge Semantics
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Cambridge Semantics
 
Syngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DSyngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DMichael Swanson
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 
Getting It System Toolkit: A customizable future for ILL & Acquisition
Getting It System Toolkit: A customizable future for ILL & AcquisitionGetting It System Toolkit: A customizable future for ILL & Acquisition
Getting It System Toolkit: A customizable future for ILL & AcquisitionTim Bowersox
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Getting It System Toolkit - GIST for Web: Discovery & Delivery Changed
Getting It System Toolkit - GIST for Web: Discovery & Delivery ChangedGetting It System Toolkit - GIST for Web: Discovery & Delivery Changed
Getting It System Toolkit - GIST for Web: Discovery & Delivery ChangedTim Bowersox
 
Cortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BICortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BIMSAdvAnalytics
 
Enable the business and make Artificial Intelligence accessible for everyone!
Enable the business and make Artificial Intelligence accessible for everyone! Enable the business and make Artificial Intelligence accessible for everyone!
Enable the business and make Artificial Intelligence accessible for everyone! Marc Lelijveld
 
GIST Acquisitions Manager Preview - 2011 ILLiad Conference
GIST Acquisitions Manager Preview - 2011 ILLiad ConferenceGIST Acquisitions Manager Preview - 2011 ILLiad Conference
GIST Acquisitions Manager Preview - 2011 ILLiad ConferenceGetting It System Toolkit
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroSqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroVishal Pawar
 
Solution architecture
Solution architectureSolution architecture
Solution architectureRajat Agrawal
 
Growth hacking in the age of Data
Growth hacking in the age of DataGrowth hacking in the age of Data
Growth hacking in the age of DataDaniel Saito
 
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...semanticsconference
 

What's hot (20)

Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
Google search vs Solr search for Enterprise search
Google search vs Solr search for Enterprise searchGoogle search vs Solr search for Enterprise search
Google search vs Solr search for Enterprise search
 
Integrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentIntegrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI Environment
 
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?
 
Syngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DSyngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&D
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Getting It System Toolkit: A customizable future for ILL & Acquisition
Getting It System Toolkit: A customizable future for ILL & AcquisitionGetting It System Toolkit: A customizable future for ILL & Acquisition
Getting It System Toolkit: A customizable future for ILL & Acquisition
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Enterprise architecture for big data projects
Enterprise architecture for big data projectsEnterprise architecture for big data projects
Enterprise architecture for big data projects
 
Getting It System Toolkit - GIST for Web: Discovery & Delivery Changed
Getting It System Toolkit - GIST for Web: Discovery & Delivery ChangedGetting It System Toolkit - GIST for Web: Discovery & Delivery Changed
Getting It System Toolkit - GIST for Web: Discovery & Delivery Changed
 
Cortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BICortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BI
 
Enable the business and make Artificial Intelligence accessible for everyone!
Enable the business and make Artificial Intelligence accessible for everyone! Enable the business and make Artificial Intelligence accessible for everyone!
Enable the business and make Artificial Intelligence accessible for everyone!
 
GIST Acquisitions Manager Preview - 2011 ILLiad Conference
GIST Acquisitions Manager Preview - 2011 ILLiad ConferenceGIST Acquisitions Manager Preview - 2011 ILLiad Conference
GIST Acquisitions Manager Preview - 2011 ILLiad Conference
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroSqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
 
Solution architecture
Solution architectureSolution architecture
Solution architecture
 
Growth hacking in the age of Data
Growth hacking in the age of DataGrowth hacking in the age of Data
Growth hacking in the age of Data
 
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
 

Similar to Search and Recommendations: 3 Sides of the Same Coin

Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...Databricks
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
Applying NLP and Machine Learning to Keyword Analysis
Applying NLP and Machine Learning to Keyword AnalysisApplying NLP and Machine Learning to Keyword Analysis
Applying NLP and Machine Learning to Keyword AnalysisDan Segal
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Amazon Web Services
 
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...Lucidworks
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Amazon Web Services
 
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation FrameworkMongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation FrameworkMongoDB
 
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Amazon Web Services
 
Elastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElasticsearch
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithmMasahiko Umeno
 
How Financial Services can Save On File Storage
How Financial Services can Save On File Storage How Financial Services can Save On File Storage
How Financial Services can Save On File Storage Charly Mostert
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Amazon Web Services
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...Amazon Web Services
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationTamikaTannis
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceAWS Germany
 
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...Databricks
 

Similar to Search and Recommendations: 3 Sides of the Same Coin (20)

Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Applying NLP and Machine Learning to Keyword Analysis
Applying NLP and Machine Learning to Keyword AnalysisApplying NLP and Machine Learning to Keyword Analysis
Applying NLP and Machine Learning to Keyword Analysis
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
 
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
 
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation FrameworkMongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
 
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
 
Elastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and action
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithm
 
How Financial Services can Save On File Storage
How Financial Services can Save On File Storage How Financial Services can Save On File Storage
How Financial Services can Save On File Storage
 
Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performance
 
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
 

More from Nick Pentreath

Notebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNotebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNick Pentreath
 
Scaling up deep learning by scaling down
Scaling up deep learning by scaling downScaling up deep learning by scaling down
Scaling up deep learning by scaling downNick Pentreath
 
End-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNXEnd-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNXNick Pentreath
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesNick Pentreath
 
AI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayAI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayNick Pentreath
 
IBM Developer Model Asset eXchange
IBM Developer Model Asset eXchangeIBM Developer Model Asset eXchange
IBM Developer Model Asset eXchangeNick Pentreath
 
IBM Developer Model Asset eXchange - Deep Learning for Everyone
IBM Developer Model Asset eXchange - Deep Learning for EveryoneIBM Developer Model Asset eXchange - Deep Learning for Everyone
IBM Developer Model Asset eXchange - Deep Learning for EveryoneNick Pentreath
 
Productionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsProductionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsNick Pentreath
 

More from Nick Pentreath (8)

Notebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNotebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and Kubeflow
 
Scaling up deep learning by scaling down
Scaling up deep learning by scaling downScaling up deep learning by scaling down
Scaling up deep learning by scaling down
 
End-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNXEnd-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNX
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
 
AI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayAI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI Day
 
IBM Developer Model Asset eXchange
IBM Developer Model Asset eXchangeIBM Developer Model Asset eXchange
IBM Developer Model Asset eXchange
 
IBM Developer Model Asset eXchange - Deep Learning for Everyone
IBM Developer Model Asset eXchange - Deep Learning for EveryoneIBM Developer Model Asset eXchange - Deep Learning for Everyone
IBM Developer Model Asset eXchange - Deep Learning for Everyone
 
Productionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsProductionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for Analytics
 

Recently uploaded

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 

Recently uploaded (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 

Search and Recommendations: 3 Sides of the Same Coin

  • 1. Search & Recommendations: 3 Sides of the Same Coin Nick Pentreath Principal Engineer @MLnick DBG / June 11, 2018 / © 2018 IBM Corporation
  • 2. About @MLnick on Twitter & Github Principal Engineer, IBM CODAIT - Center for Open-Source Data & AI Technologies Machine Learning & AI Apache Spark committer & PMC Author of Machine Learning with Spark Various conferences & meetups DBG / June 11, 2018 / © 2018 IBM Corporation
  • 3. Center for Open Source Data and AI Technologies CODAIT codait.org DBG / June 11, 2018 / © 2018 IBM Corporation CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission Improving Enterprise AI Lifecycle in Open Source Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model Python Data Science Stack Fabric for Deep Learning (FfDL) Mleap + PFA Scikit-LearnPandas Apache Spark Apache Spark Jupyter Model Asset eXchange Keras + Tensorflow
  • 4. Agenda Recommender systems overview Search & Recommendations 3 Sides of the Coin Performance Summary DBG / June 11, 2018 / © 2018 IBM Corporation
  • 5. Recommender Systems DBG / June 11, 2018 / © 2018 IBM Corporation
  • 6. Users and Items Recommender Systems DBG / June 11, 2018 / © 2018 IBM Corporation
  • 7. System Requirements Recommender Systems DBG / June 11, 2018 / © 2018 IBM Corporation Filtering & Grouping Business Rules
  • 8. Events DBG / June 11, 2018 / © 2018 IBM Corporation Recommender Systems – Implicit preference data • Online – page view, click • Commerce – add-to-cart, purchase, return – Explicit preference data • Ratings, reviews – Intent • Search query – Social • Like, share, follow, unfollow, block
  • 9. Context Recommender Systems DBG / June 11, 2018 / © 2018 IBM Corporation
  • 10. How to handle implicit feedback? Recommender Systems DBG / June 11, 2018 / © 2018 IBM Corporation
  • 11. Cold Start DBG / June 11, 2018 / © 2018 IBM Corporation Recommender Systems New items – No historical interaction data – Typically use baselines or item content New (or unknown) users – Previously unseen or anonymous users have no user profile or historical interactions – Have context data (but possibly very limited) – Cannot directly use collaborative filtering models • Item-similarity for current item • Represent user as aggregation of items • Contextual models can incorporate short-term history
  • 12. Prediction DBG / June 11, 2018 / © 2018 IBM Corporation Recommender Systems Recommendation is ranking – Given a user and context, rank the available items in order of likelihood that the user will interact with them Sort items
  • 13. Serving Requirements DBG / June 11, 2018 / © 2018 IBM Corporation Recommender Systems – Serving => ranking large # items – Often need to filter • categories, popularity, price, geo, time – Use all data at prediction time • content, context, preference, session – Need to scale with item set and feature set – Handle cold start • content-based models or fallbacks, item-aggregates – Easily incorporate new preference data • model re-training, fold in
  • 14. Search & Recommendations DBG / June 11, 2018 / © 2018 IBM Corporation
  • 15. Prelude: Ratings Matrix DBG / June 11, 2018 / © 2018 IBM Corporation Search & Recommendation
  • 16. Prelude: Item-item Co-occurrence DBG / June 11, 2018 / © 2018 IBM Corporation Search & Recommendation One of the earliest approaches – Compute item co-occurrence matrix from ratings matrix, i.e. X t X – Scoring can be online or pre-computed – Similarity metric between items
  • 17. Prelude: Matrix Factorization DBG / June 11, 2018 / © 2018 IBM Corporation Search & Recommendation One of the de facto standard models – Find two smaller matrices (called the factor matrices) that approximate the ratings matrix – Minimize the reconstruction error (i.e. rating prediction / completion) – Efficient, scalable algorithms • Gradient Descent; Alternating Least Squares (ALS) – Prediction is simple – Can handle implicit data through weighting
  • 18. Recommendation engines DBG / June 11, 2018 / © 2018 IBM Corporation Search & Recommendation Recommendation is ranking – Given a user and context, rank the available items in order of likelihood that the user will interact with them Sort items Compute scores
  • 19. Search engines DBG / June 11, 2018 / © 2018 IBM Corporation Search & Recommendation Search is ranking – Given a query, rank the available items in order of similarity of item to query Sort items “cat videos” Compute similarity
  • 20. Meeting our Requirements? DBG / June 11, 2018 / © 2018 IBM Corporation Search & Recommendation – Serving => ranking large # items – Often need to filter • categories, popularity, price, geo, time – Use all data at prediction time • content, context, preference, session – Need to scale with item set and feature set – Handle cold start • content-based models or fallbacks, item-aggregates – Easily incorporate new preference data • model re-training, fold in Inverted indexing Model dependent
  • 21. 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation
  • 22. Broad approaches 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation Score-then- search Native search Custom ranking
  • 23. Score-then-search 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation Score-then- search Scoring system Score Search engine Search engine ResultsFilter Filter Scoring system ResultsScore
  • 24. Score-then-search 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation Score-then- search • Complete flexibility in model • Easier to incorporate richer content features • Can optimize scoring component • Maintain (at least) 2 systems • Filtering challenges • Round trips between systems
  • 25. Native Search 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation Native search Pre- compute Index Search engine ResultsSearch
  • 26. Native Search 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation Native search Pre-compute Index
  • 27. Native Search 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation Native search Indexed items Search query
  • 28. Native Search 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation Native search • No changes to search engine required • Very fast & flexible at query time • Scalable - pre-compute + thresholding • Can use (almost) all data • Must decide what to pre- compute • No ordering retained in indexed terms • More difficult to include richer content data (image, audio)
  • 29. Native Search 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation Native search Universal recommender The Mahout Correlated Cross-Occurrence Algorithm
  • 30. Custom Ranking 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation Custom ranking Scoring system Score & filter Search engine Results One system
  • 31. Search Ranking 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation
  • 32. Can we use the same machinery? 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation
  • 33. Delimited Payload Filter 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation
  • 34. Custom Scoring Function DBG / June 11, 2018 / © 2018 IBM Corporation 3 Sides of the Coin – Native script (Java), compiled for speed – Scoring function computes dot product by: • For each document vector index (“term”), retrieve payload • Score += payload * query(i) – Normalizes with query vector norm and document vector norm for cosine similarity
  • 35. Can we use the same machinery? 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation
  • 36. Get search for free! 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation
  • 37. Custom Ranking 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation Custom ranking • Combine search & scoring into one system • Potential to incorporate richer content features & contextual models • Combine best of both worlds between pre- computing & online scoring (sort of) • Requires custom plugin for your search engine • Scaling limits & performance considerations • Must decide how to blend interactions (e.g. weights)
  • 38. Custom Ranking 3 Sides of the Coin DBG / June 11, 2018 / © 2018 IBM Corporation Custom ranking Spark & Elasticsearch Recommender on IBM Code Elasticsearch vector scoring plugin
  • 39. Performance DBG / June 11, 2018 / © 2018 IBM Corporation
  • 40. Custom Scoring Performance DBG / June 11, 2018 / © 2018 IBM Corporation Performance *3x nodes, 30x shards
  • 41. Custom Scoring Performance DBG / June 11, 2018 / © 2018 IBM Corporation Performance *3x nodes, k=50
  • 42. Comparison to Score-then-search DBG / June 11, 2018 / © 2018 IBM Corporation Performance *3x nodes, 30x shards, k=50, 1,000,000 items
  • 43. Scaling custom scoring - LSH DBG / June 11, 2018 / © 2018 IBM Corporation Performance *3x nodes, 30x shards, k=50, 1,000,000 items
  • 44. Pure Search Approaches DBG / June 11, 2018 / © 2018 IBM Corporation
  • 45. “Pure” search engine approaches DBG / June 11, 2018 / © 2018 IBM Corporation Pure Search – More like this • Content similarity – Significant terms queries • 2 stage query: – 1st query interactions to get set of items for a user – 2nd significant terms aggregation with the item set as background • Result is very similar to co-occurrence approaches • Round trips may be slow
  • 46. Conclusion DBG / June 11, 2018 / © 2018 IBM Corporation
  • 47. Custom Ranking – Future Directions DBG / June 11, 2018 / © 2018 IBM Corporation Conclusion – Apache Solr version • https://github.com/saaay71/solr-vector-scoring – Improve performance • Improved scoring performance for vector scoring plugin: https://github.com/lior-k/fast-elasticsearch- vector-scoring • Investigate performance of LSH-filtered scoring • Dig deeper into Lucene internals to combine matrix- vector math with search & filter? – Investigate more complex models
  • 48. Summary DBG / June 11, 2018 / © 2018 IBM Corporation Conclusion All approaches have different tradeoffs Score-then- search Native search Custom ranking Max model flexibility Max search integrationBalanced
  • 49. Thank you codait.org twitter.com/MLnick github.com/MLnick developer.ibm.com/code DBG / June 11, 2018 / © 2018 IBM Corporation FfDL Sign up for IBM Cloud and try Watson Studio! https://ibm.biz/BdZ6qf IBM Code Pattern https://datascience.ibm.com/ MAX
  • 50. Call for Code inspires developers to solve pressing global problems with sustainable software delivering on their vast potential to do good. Bringing together NGOs, institutions, enterprises, and startup developers to compete build effective disaster mitigation solutions, with a focus on health and well-being. International Federation of Red Cross/Red Crescent, The American American Red Cross, and the United Nations Office of Human Rights combine for the Call for Code Award to elevate the profile of developers. Award winners will receive long-term support through open source foundations, financial prizes, the opportunity to present their solution to leading VCs, and will deploy their solution through IBM’s Corporate Service Corps. Developers will jump-start their project with dedicated IBM Code Patterns, combined with optional enterprise technology to build projects over the course of three months. Judged by the world’s most renowned technologists, the grand prize will be presented in October at an Award Event. developer.ibm.com/callforcode
  • 51. Links & References Spark & Elasticsearch Recommender on IBM Code Elasticsearch vector scoring plugin Solr vector scoring plugin Improved performance Elasticsearch plugin Elasticsearch More Like This Query Elasticsearch significant terms aggregation Universal recommender The Mahout Correlated Cross-Occurrence Algorithm DBG / June 11, 2018 / © 2018 IBM Corporation
  • 52. DBG / June 11, 2018 / © 2018 IBM Corporation