SlideShare a Scribd company logo
1 of 19
Leveraging Dimensionality for Graph-Based
Recommendations with Sparse Data
WILL EVANS | GRAPHABLE
© GRAPHABLE, INC 2020
Intro
Will Evans, VP Strategy and Innovation at Graphable
Working with Neo4j for 4+ years
My first project was a recommendation engine for
blogs, which I did completely wrong
This will be recommendations + experience to help
guide you in a better direction
Graphable: Consultancy, Exclusive Hume Reseller in
the US
© GRAPHABLE, INC 2020
Beer Graph Recommendations
The goal for today’s session is to start with a dataset and a business goal, and take you all the
way from schema design to usable results that could be deployed
1. Use Case
2. Schema Design
3. Data Ingestion
4. Cypher Development
5. Finding Similarity for Recommendations
6. Conclusion
© GRAPHABLE, INC 2020
Use Case: Recommendations
Recommendation engines are always critical for a good customer experience
These recommendation engines are especially necessary when the customer is faced with a large product
assortment where searching is not feasible, or product discovery is desired
Additionally, recommendation engines are critical to personalization and making sure each
individual customer is shown the products they are most likely to want to buy
◦ At the same time, we want to be able to show each customer the most diverse set of products as possible within this
space
But, usually business with large assortments of products and a customer base that would benefit
from personalization:
1. Have sparse data that doesn't lend itself to most machine learning algorithms
2. Involve a customer pool of unique individuals with diverse taste profiles
© GRAPHABLE, INC 2020
Progress, not
perfection
DON’T LET THE PERFECT BE THE
ENEMY OF THE GOOD
Nick Dolding/Getty Images
© GRAPHABLE, INC 2020
Schema Design: Goal and Data
GOAL: Recommend beer to users.
DATA:
- Sparse, high dimensional data
- Dimensions: beer, beer style, brewer, review
text, glass type, flavors, beer name, beer name
sentiment, country, user beer journey/expertise,
review sentiment, etc. etc.
- Facts: Users, review scores
- P > N
- Most Beers have 1 or fewer reviews, most
Users have reviewed 2 or fewer beers
© GRAPHABLE, INC 2020
Schema Design: The design
With sparse, high dimensional data the schema design is
often simple. In our instance:
1. Beer is the center of our universe, so we start with the
beer. Where does beer come from?
2. Beer comes from a Brewer, and Brewers have
multiple beers.
3. Beers also are of a certain style, and Styles have
many beers
4. Users are unrelated to any of other entities, so they
become an additional entity
5. Reviews only exist as an interaction of User and Beer
© GRAPHABLE, INC 2020
Data Ingestion: Source Data
We used Orchestra in GraphAware’s Hume for loading data, but a transformation script and LOAD
CSV or a bulk load would work just as well
© GRAPHABLE, INC 2020
Data Ingestion:
Brewer Names
© GRAPHABLE, INC 2020
Cypher Development: Overview
Don’t boil the ocean
Start with individual modules and unify them at the end
◦ Test review scores by X dimension, and explore results
◦ Start aggregating and weighting
◦ Avoid complexity if it’s not needed. You can achieve good, repeatable results from a hard dataset using a
collection of basic techniques
◦ Rule-based+
Clustering algorithms, node2vec, etc. have diminishing returns, and results depend heavily on the
data size. Do a thorough cost analysis for ROI of the improved accuracy compared to the effort
after you’ve already deployed and measured good repeatable recommendations in production
© GRAPHABLE, INC 2020
Cypher Development: Building
Recommendations
Building a recommendation engine with sparse, high-dimensional data is a tricky process
◦ Cold (cool?) start problem
◦ Many users have reviewed 1 beer, and many beers have only 1 review
◦ Very little overlap across users and beers they've reviewed
◦ Lack of any discernable pattern to walk between Users and Beers and find “similar”
◦ Some brewers have many beers, some have few
◦ Overlapping beer styles that do not clearly identify beer flavor
What we cannot and should not do:
◦ Recommend beers based on co-review occurrence (collaborative filtering)
◦ Base recommendation on beers with highest scores
◦ Simply parse review text to extract vector embeddings to find similar reviews and then recommend the corresponding beers
Proceed carefully to prevent narrow and mismatched recommendations
© GRAPHABLE, INC 2020
Cypher Development: "Cold Start"
Solution
Introducing new products to people is never easy, push the wrong product and bad things happen
Since so many users have only reviewed one or two beers, they would never get relevant beers
recommended to them
Solution is to leverage the hierarchy of beer to style and find highly-related styles to then drill back down to
beer
◦ Even with this approach we are still left with too big of a set
◦ Ranking by review score is still not enough
Next, a little NLP without NLP, leveraging review text to find words that show up in beer names
◦ Adds an additional dimension to the search
◦ Matches users based on their review phrasing to find beers that might have the correct name matching that is
grounded by review scores
© GRAPHABLE, INC 2020
Cypher
Development:
“Cold Start”
Solution
Will demonstrate results at the
end
© GRAPHABLE, INC 2020
Cypher Development: Cosine Similarity
© GRAPHABLE, INC 2020
This query is
designed to be able
to take the user on a
"Beer Walk" using
review scores
But... review scores are
multi-dimensional and can
go in different directions
Some users might score the
same beer more highly on
palate vs aroma vs
appearance
Because the overall
review is in fact a
vector of scores, the
cosine similarity
metric can be
deployed
This allows beers to be
matched based on sharing
similar overall patterns of
scoring, not just the total or
average score
When a user selects
a beer, the algorithm
will find beers that
have a similar
pattern of review
scoring first
Then a randomizer breaks
the tie by selecting the next
best beer
Over multiple rotations, this
algorithm will take the user
on a weighted "random
walk" through the beer
landscape
This algorithm can
be paired with the
Cold Start as an
ensemble or as a
standalone
Overall, these two
algorithms can be
paired together to
guide users through
the "Universe of
Beers" and pick the
best matches while
maintaining
diversity
Cypher
Development
: Cosine
Similarity
© GRAPHABLE, INC 2020
Will demonstrate
results at the end
Leveraging Subject Matter Expertise
While human preference is personal, our palate and tastes change as we mature and move toward the
level of "connoisseur"
Using large-scale machine learning trusts that the average level of "palate education" will conform to all
beer drinkers
◦ A study on wine expertise has shown how wine tastes and choices change over time
◦ Without accommodating this, there is a strong potential that the model is aggregating away a lot of data
Using parameterized dimensional recommendations allow expert humans to define the key dimensions on
which beers should be matched to individuals
◦ Machine learning is fundamentally regressive and pegs the user to their prior state
◦ With this approach, we can leverage machines to speed up searches across a broad range of products while
leveraging the knowledge of beer experts instead of relying on the "herd" or "crowd" to make recommendations
© GRAPHABLE, INC 2020
Deploying to production
“It depends”
Determine your performance requirements and data size to dictate if you need to precompute or if
you can run queries on the fly
How and where are your recommendations generated?
How do you measure and evaluate success of the recommendations?
-> Use recommendations to generate more data and facts
Rule-based+
Dimensions can evolve along their own timescale because we treat them individually. In a
standard ML algorithm they must all evolve at the same rate
© GRAPHABLE, INC 2020
Using Actions
PUT A HUME PARAMETERIZED ACTION HERE
© GRAPHABLE, INC 2020
Conclusion
Rule-based+
Be careful of ROI on data science projects
A few combinations of expertise guided modules is likely better for highly dimensional data than
ML
Questions?
© GRAPHABLE, INC 2020

More Related Content

More from Neo4j

GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...Neo4j
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AINeo4j
 
Ingka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignIngka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignNeo4j
 
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Neo4j
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Neo4j
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...Neo4j
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxNeo4j
 

More from Neo4j (20)

GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
 
Ingka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignIngka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by Design
 
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
EY: Graphs as Critical Enablers for LLM-based Assistants- the Case of Custome...
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Leveraging Dimensionality for Graph-Based Recommendations with Sparse Data

  • 1. Leveraging Dimensionality for Graph-Based Recommendations with Sparse Data WILL EVANS | GRAPHABLE © GRAPHABLE, INC 2020
  • 2. Intro Will Evans, VP Strategy and Innovation at Graphable Working with Neo4j for 4+ years My first project was a recommendation engine for blogs, which I did completely wrong This will be recommendations + experience to help guide you in a better direction Graphable: Consultancy, Exclusive Hume Reseller in the US © GRAPHABLE, INC 2020
  • 3. Beer Graph Recommendations The goal for today’s session is to start with a dataset and a business goal, and take you all the way from schema design to usable results that could be deployed 1. Use Case 2. Schema Design 3. Data Ingestion 4. Cypher Development 5. Finding Similarity for Recommendations 6. Conclusion © GRAPHABLE, INC 2020
  • 4. Use Case: Recommendations Recommendation engines are always critical for a good customer experience These recommendation engines are especially necessary when the customer is faced with a large product assortment where searching is not feasible, or product discovery is desired Additionally, recommendation engines are critical to personalization and making sure each individual customer is shown the products they are most likely to want to buy ◦ At the same time, we want to be able to show each customer the most diverse set of products as possible within this space But, usually business with large assortments of products and a customer base that would benefit from personalization: 1. Have sparse data that doesn't lend itself to most machine learning algorithms 2. Involve a customer pool of unique individuals with diverse taste profiles © GRAPHABLE, INC 2020
  • 5. Progress, not perfection DON’T LET THE PERFECT BE THE ENEMY OF THE GOOD Nick Dolding/Getty Images © GRAPHABLE, INC 2020
  • 6. Schema Design: Goal and Data GOAL: Recommend beer to users. DATA: - Sparse, high dimensional data - Dimensions: beer, beer style, brewer, review text, glass type, flavors, beer name, beer name sentiment, country, user beer journey/expertise, review sentiment, etc. etc. - Facts: Users, review scores - P > N - Most Beers have 1 or fewer reviews, most Users have reviewed 2 or fewer beers © GRAPHABLE, INC 2020
  • 7. Schema Design: The design With sparse, high dimensional data the schema design is often simple. In our instance: 1. Beer is the center of our universe, so we start with the beer. Where does beer come from? 2. Beer comes from a Brewer, and Brewers have multiple beers. 3. Beers also are of a certain style, and Styles have many beers 4. Users are unrelated to any of other entities, so they become an additional entity 5. Reviews only exist as an interaction of User and Beer © GRAPHABLE, INC 2020
  • 8. Data Ingestion: Source Data We used Orchestra in GraphAware’s Hume for loading data, but a transformation script and LOAD CSV or a bulk load would work just as well © GRAPHABLE, INC 2020
  • 9. Data Ingestion: Brewer Names © GRAPHABLE, INC 2020
  • 10. Cypher Development: Overview Don’t boil the ocean Start with individual modules and unify them at the end ◦ Test review scores by X dimension, and explore results ◦ Start aggregating and weighting ◦ Avoid complexity if it’s not needed. You can achieve good, repeatable results from a hard dataset using a collection of basic techniques ◦ Rule-based+ Clustering algorithms, node2vec, etc. have diminishing returns, and results depend heavily on the data size. Do a thorough cost analysis for ROI of the improved accuracy compared to the effort after you’ve already deployed and measured good repeatable recommendations in production © GRAPHABLE, INC 2020
  • 11. Cypher Development: Building Recommendations Building a recommendation engine with sparse, high-dimensional data is a tricky process ◦ Cold (cool?) start problem ◦ Many users have reviewed 1 beer, and many beers have only 1 review ◦ Very little overlap across users and beers they've reviewed ◦ Lack of any discernable pattern to walk between Users and Beers and find “similar” ◦ Some brewers have many beers, some have few ◦ Overlapping beer styles that do not clearly identify beer flavor What we cannot and should not do: ◦ Recommend beers based on co-review occurrence (collaborative filtering) ◦ Base recommendation on beers with highest scores ◦ Simply parse review text to extract vector embeddings to find similar reviews and then recommend the corresponding beers Proceed carefully to prevent narrow and mismatched recommendations © GRAPHABLE, INC 2020
  • 12. Cypher Development: "Cold Start" Solution Introducing new products to people is never easy, push the wrong product and bad things happen Since so many users have only reviewed one or two beers, they would never get relevant beers recommended to them Solution is to leverage the hierarchy of beer to style and find highly-related styles to then drill back down to beer ◦ Even with this approach we are still left with too big of a set ◦ Ranking by review score is still not enough Next, a little NLP without NLP, leveraging review text to find words that show up in beer names ◦ Adds an additional dimension to the search ◦ Matches users based on their review phrasing to find beers that might have the correct name matching that is grounded by review scores © GRAPHABLE, INC 2020
  • 13. Cypher Development: “Cold Start” Solution Will demonstrate results at the end © GRAPHABLE, INC 2020
  • 14. Cypher Development: Cosine Similarity © GRAPHABLE, INC 2020 This query is designed to be able to take the user on a "Beer Walk" using review scores But... review scores are multi-dimensional and can go in different directions Some users might score the same beer more highly on palate vs aroma vs appearance Because the overall review is in fact a vector of scores, the cosine similarity metric can be deployed This allows beers to be matched based on sharing similar overall patterns of scoring, not just the total or average score When a user selects a beer, the algorithm will find beers that have a similar pattern of review scoring first Then a randomizer breaks the tie by selecting the next best beer Over multiple rotations, this algorithm will take the user on a weighted "random walk" through the beer landscape This algorithm can be paired with the Cold Start as an ensemble or as a standalone Overall, these two algorithms can be paired together to guide users through the "Universe of Beers" and pick the best matches while maintaining diversity
  • 15. Cypher Development : Cosine Similarity © GRAPHABLE, INC 2020 Will demonstrate results at the end
  • 16. Leveraging Subject Matter Expertise While human preference is personal, our palate and tastes change as we mature and move toward the level of "connoisseur" Using large-scale machine learning trusts that the average level of "palate education" will conform to all beer drinkers ◦ A study on wine expertise has shown how wine tastes and choices change over time ◦ Without accommodating this, there is a strong potential that the model is aggregating away a lot of data Using parameterized dimensional recommendations allow expert humans to define the key dimensions on which beers should be matched to individuals ◦ Machine learning is fundamentally regressive and pegs the user to their prior state ◦ With this approach, we can leverage machines to speed up searches across a broad range of products while leveraging the knowledge of beer experts instead of relying on the "herd" or "crowd" to make recommendations © GRAPHABLE, INC 2020
  • 17. Deploying to production “It depends” Determine your performance requirements and data size to dictate if you need to precompute or if you can run queries on the fly How and where are your recommendations generated? How do you measure and evaluate success of the recommendations? -> Use recommendations to generate more data and facts Rule-based+ Dimensions can evolve along their own timescale because we treat them individually. In a standard ML algorithm they must all evolve at the same rate © GRAPHABLE, INC 2020
  • 18. Using Actions PUT A HUME PARAMETERIZED ACTION HERE © GRAPHABLE, INC 2020
  • 19. Conclusion Rule-based+ Be careful of ROI on data science projects A few combinations of expertise guided modules is likely better for highly dimensional data than ML Questions? © GRAPHABLE, INC 2020