SlideShare a Scribd company logo

Haystacks slides

his talk will feature some of my recent research into the alternative uses for Solr facets and facet metadata. I will develop the idea that facets can be used to discover similarities between items and attributes in a search index, and show some interesting applications of this idea. A common takeaway is that using facets and facet metadata in non-conventional ways enables the semantic context of a query to be automatically tuned. This has important implications for user-centric and semantically focused relevance.

1 of 24
Download to read offline
Facets and Similarity
Exploring the Meta-Informational Hyperspace
Ted Sullivan
Lucidworks, Inc.
Information Spaces
Use Cases: Search and Discovery
Knowledge Spaces
Asking the right question (knowing what questions to ask)
Navigation and Visualization
Alexa: Who’s on first?
What’s the first baseman’s name?
Relevance - Similarity - Precision - Classification
Vectors and Vector Spaces
Are Information spaces like Euclidian or Cartesian spaces?
Knowledge Bases
Lamp Table
Side Table
Table Lamp
Facet Synonyms - Spatial Metaphors
Parameters
Dimensions
Navigators
Refiners
Supports the notion of some kind of n-dimensional information
“space”
I call it a meta-informational space
Traditional Uses - Navigation and Visualization
Verity K2
Endeca
Fast ESP
MS Fast
Contexts are Viewpoints / Perspectives in Information Space
“The circumstances that form the setting for an event, statement, or idea, and in
terms of which it can be fully understood and assessed.”
Personal Contexts
Who is searching?

What are their roles / interests?

What have the searched for in the past (including just now)?

What are they allowed to search for?
Semantic Contexts
Homonym / Polysemy Problem 



“apple”



Tech company, Horticulture, Food, Music, New York City

What is the subject area or domain?
Contexts
Facet - Similarity Theorem
Lemma 1: Similar things tend to occur in similar contexts
Lemma 2: Facets are a tool for exploring meta-informational contexts
Theorem: Facets can be used to find similar things.
Facets
“A particular aspect or feature of something.”
Facets are Metadata
”Data about data" - attributes, aspects, descriptors, features, properties,
traits
Metadata Semantics: what, where, when, why
name, size, shape, color, material, texture
manufacturer, number of outlets, voltage, is pre-assembled …
address, phone number, birth date, user rating
Metadata Contexts: Some metadata fields depend on “what" the “thing” is,
e.g. People have different attributes than Toaster Ovens
Metadata provides Semantic Mappings
Consist of field name = field value pairings
Map Terms to Concepts
The term ‘red’ is known to be a ‘color’ because it is a
value in the ‘color’ field

Recommended

Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
Interleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsJohn T. Kane
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Lucidworks
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformTrey Grainger
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 

More Related Content

What's hot

Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Kai Li
 
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Lucidworks
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchTrey Grainger
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Xun Wang
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
 
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Lucidworks
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 

What's hot (20)

Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
 
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 

Similar to Haystacks slides

A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksLucidworks
 
The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsBen DeMott
 
Classification, Tagging & Search
Classification, Tagging & SearchClassification, Tagging & Search
Classification, Tagging & SearchJames Melzer
 
NetIKX Semantic Search Presentation
NetIKX Semantic Search PresentationNetIKX Semantic Search Presentation
NetIKX Semantic Search Presentationurvics
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantinimaxfalc
 
Taxonomies And Search Aiim Mn
Taxonomies And Search Aiim MnTaxonomies And Search Aiim Mn
Taxonomies And Search Aiim MnAIIM Minnesota
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with searchJean Graef
 
Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012 Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012 Thanh Tran
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiPhilly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiRobert Calcavecchia
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayAmit Sheth
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Marianne Sweeny
 
Automatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles DuncanAutomatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles DuncanJISC CETIS
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Bradley Allen
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrLucidworks
 
How search engines work Anand Saini
How search engines work Anand SainiHow search engines work Anand Saini
How search engines work Anand SainiDr,Saini Anand
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 
ACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise SearchACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise SearchGan Keng Hoon
 
Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...ijsrd.com
 

Similar to Haystacks slides (20)

A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
 
The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementations
 
Classification, Tagging & Search
Classification, Tagging & SearchClassification, Tagging & Search
Classification, Tagging & Search
 
NetIKX Semantic Search Presentation
NetIKX Semantic Search PresentationNetIKX Semantic Search Presentation
NetIKX Semantic Search Presentation
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
Taxonomies And Search Aiim Mn
Taxonomies And Search Aiim MnTaxonomies And Search Aiim Mn
Taxonomies And Search Aiim Mn
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with search
 
Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012 Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiPhilly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World Today
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Automatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles DuncanAutomatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles Duncan
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
 
How search engines work Anand Saini
How search engines work Anand SainiHow search engines work Anand Saini
How search engines work Anand Saini
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Sub1522
Sub1522Sub1522
Sub1522
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
ACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise SearchACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise Search
 
Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...
 

Recently uploaded

OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20Shane Coughlan
 
sql ppt for students who preparing for sql
sql ppt for students who preparing for sqlsql ppt for students who preparing for sql
sql ppt for students who preparing for sqlbharatjanadharwarud
 
killingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdfkillingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdfssuser82c38d
 
The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!ISPMAIndia
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)GDSCNiT
 
killing camp week 6 problem - maximal matrix.pdf
killing camp week 6 problem - maximal matrix.pdfkilling camp week 6 problem - maximal matrix.pdf
killing camp week 6 problem - maximal matrix.pdfssuser82c38d
 
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdfAUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdfAutokey
 
SPM 2024 – Overview of and benefits of AI in Product Management
SPM 2024 – Overview of and benefits of AI in Product ManagementSPM 2024 – Overview of and benefits of AI in Product Management
SPM 2024 – Overview of and benefits of AI in Product ManagementISPMAIndia
 
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ..."Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...ISPMAIndia
 
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이ssuser82c38d
 
LLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flowLLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flowNaoki (Neo) SATO
 
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...emili denli
 
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...ISPMAIndia
 
Getting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxGetting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxmavinoikein
 
AI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriAI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriISPMAIndia
 
No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!Anthony Dahanne
 
Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Asher Sterkin
 
Automation for Bonterra Impact Management (fka Apricot)
Automation for Bonterra Impact Management (fka Apricot)Automation for Bonterra Impact Management (fka Apricot)
Automation for Bonterra Impact Management (fka Apricot)Jeffrey Haguewood
 

Recently uploaded (20)

OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20
 
sql ppt for students who preparing for sql
sql ppt for students who preparing for sqlsql ppt for students who preparing for sql
sql ppt for students who preparing for sql
 
killingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdfkillingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdf
 
The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
 
killing camp week 6 problem - maximal matrix.pdf
killing camp week 6 problem - maximal matrix.pdfkilling camp week 6 problem - maximal matrix.pdf
killing camp week 6 problem - maximal matrix.pdf
 
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdfAUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
 
SPM 2024 – Overview of and benefits of AI in Product Management
SPM 2024 – Overview of and benefits of AI in Product ManagementSPM 2024 – Overview of and benefits of AI in Product Management
SPM 2024 – Overview of and benefits of AI in Product Management
 
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ..."Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
 
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
killingcamp 광고삽입문제 풀이, killingcamp 광고삽입문제 풀이
 
LLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flowLLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flow
 
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
 
eLearning Content Development Company Code and Pixels.pdf
eLearning Content Development Company Code and Pixels.pdfeLearning Content Development Company Code and Pixels.pdf
eLearning Content Development Company Code and Pixels.pdf
 
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
 
Getting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxGetting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptx
 
AI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriAI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit Bendigiri
 
No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!
 
Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024
 
Automation for Bonterra Impact Management (fka Apricot)
Automation for Bonterra Impact Management (fka Apricot)Automation for Bonterra Impact Management (fka Apricot)
Automation for Bonterra Impact Management (fka Apricot)
 

Haystacks slides

  • 1. Facets and Similarity Exploring the Meta-Informational Hyperspace Ted Sullivan Lucidworks, Inc.
  • 2. Information Spaces Use Cases: Search and Discovery Knowledge Spaces Asking the right question (knowing what questions to ask) Navigation and Visualization Alexa: Who’s on first? What’s the first baseman’s name? Relevance - Similarity - Precision - Classification Vectors and Vector Spaces Are Information spaces like Euclidian or Cartesian spaces? Knowledge Bases Lamp Table Side Table Table Lamp
  • 3. Facet Synonyms - Spatial Metaphors Parameters Dimensions Navigators Refiners Supports the notion of some kind of n-dimensional information “space” I call it a meta-informational space Traditional Uses - Navigation and Visualization Verity K2 Endeca Fast ESP MS Fast
  • 4. Contexts are Viewpoints / Perspectives in Information Space “The circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood and assessed.” Personal Contexts Who is searching? What are their roles / interests? What have the searched for in the past (including just now)? What are they allowed to search for? Semantic Contexts Homonym / Polysemy Problem “apple” Tech company, Horticulture, Food, Music, New York City What is the subject area or domain? Contexts
  • 5. Facet - Similarity Theorem Lemma 1: Similar things tend to occur in similar contexts Lemma 2: Facets are a tool for exploring meta-informational contexts Theorem: Facets can be used to find similar things.
  • 6. Facets “A particular aspect or feature of something.” Facets are Metadata ”Data about data" - attributes, aspects, descriptors, features, properties, traits Metadata Semantics: what, where, when, why name, size, shape, color, material, texture manufacturer, number of outlets, voltage, is pre-assembled … address, phone number, birth date, user rating Metadata Contexts: Some metadata fields depend on “what" the “thing” is, e.g. People have different attributes than Toaster Ovens Metadata provides Semantic Mappings Consist of field name = field value pairings Map Terms to Concepts The term ‘red’ is known to be a ‘color’ because it is a value in the ‘color’ field
  • 7. Metadata as a Knowledge Base Faceted Navigation - Top-down or drill-in Search - More direct or bottom-up Query Autofiltering: Uses facet metadata in search collection to determine semantic meaning of search terms. Semantic Knowledge is Power Can use this built-in knowledge to “short-circuit” the “search then drill in” paradigm Metadata cardinality and “Boolean in the Vernacular” Semantic Pattern Rules $DRUG treats $SYMPTOM vs. $DRUG causes $SYMPTOM $Accessory for $Product (e.g. “case for iPhone”) Enables precise bottom-up search Dependent on metadata quality and completeness Improving metadata can improve search too
  • 8. Categorical vs. Numerical Some metadata is non-numerical - i.e. categorical Similarity in numerical hyper-spaces is modeled as Euclidian space Numerical Similarity Search Relevance / Clustering / Learning Algorithms Use Term Probability Vectors (tf/idf) in unstructured text Everything must be a number - categorical data is indexed (arbitrary) Similarity is based on linear or angular closeness of vectors Detects patterns - which may not be intuitive => black-box models Facet-based Similarity Similarity based on shared categorical and numerical ranges Numerical data are ranged or “binned” to be compatible with category
  • 9. Navigating Categorical Spaces Pivot Facets: Paths or trajectories through categorical spaces Multi-Dimensional Query Suggester Pivot Patterns - Semantically “sensible” permutations of metadata fields $First_Name $Last_Name $Occupation $City $State Bob Jones Accountant Cincinnati OH Multi-dimensional queries and precision Users expect greater precision In results (i.e. fewer) when they add refining information to the query Traditional “bag-of-words” search algorithms often fail to deliver on this expectation Typeahead solutions should show queries that will “work” - rapid visualization of available content Query Autofiltering solution is tailored to this since it can navigate the same categorical space that the pivot patterns generate Gotcha: Both solutions depend on accurate and complete metadata at sufficient level of granularity.
  • 10. Adding Context to Suggester via Facets Suggestions are validated against content collection Facets are used to acquire contextual metadata from a content collection while building a typeahead collection Use Cases Security Trimming of Suggestions If a query only hits on secured documents, do not want to show that query to users that cannot see any of the documents Solution: Use facets to get the list of ACLs that are associated with a query Dynamic boosting of suggestions based on previous searches Contextual metadata added to typeahead collection boosts similar suggestions Solution: In typeahead application retain context metadata for selected queries and re-send it as boost queries in subsequent typeahead requests Facet-Similarity Theorem at work
  • 11. Building a Suggester with Dynamic Context Uses Facet Queries against a Content Collection to create additional metadata for the Suggester or Typeahead Collection. This contextual metadata can then be used for: • Security Trimming of Typeahead suggestions • Dynamic boosting of similar suggestions within a user session
  • 12. Building a Suggester with Dynamic Context Bring back other fields in addition to displayed suggestion text (i.e., the ones that were calculated using faceting) If a query is used to search, temporarily store its associated metadata in a circular cache on the browser. When submitting the next typeahead query, add the cached information from the queue as boost queries. Type ‘j’ - get back Jai Johnny Johanson Bands Jai Johnny Johanson Groups J.J. Johnson Jai Johnny Johanson Juke Joint Jezebel Juke Joint Jimmy Just searched for ‘Paul McCartney’ then type ‘j’ John Lennon John Lennon Songs John Lennon Songs Covered James P Johnson Songs (?) John Lennon Originals Hey Jude
  • 13. Structured vs. Unstructured Data Faceted navigation requires structured data Search is designed to handle unstructured data Query Autofiltering enables precise search of structured data without complex Query Language - Builds structured query from the inherent semantics of a “free text” query “Who’s In The Who” Structured Data = has metadata Real-World - Data is imperfect / incomplete Generally speaking there is not enough structure eCommerce - tends to focus on top-down due to ubiquity of faceted navigation e.g. “semi-structured” Enterprise Search - document rich Available metadata is poor in describing “Aboutness”
  • 14. Analyzing Text to Extract Metadata Search Engine Analyze text to create “inverted index” and to parse the query Text Mining Analyze text to extract entities, concepts, categories Goal - Improved Metadata Through Text Mining Case 1: Extracting product type and product attributes metadata from short product descriptions in eCommerce data - dealing with precision and recall Use “directed” NLP techniques to extract precise metadata. “Coffee Pods for Keurig Coffee Makers” Case 2: Large text documents. Want to extract keywords and assign categories to documents. Add metadata concerning “aboutness”
  • 15. Auto phrasing Auto Phrasing - Multi-term phrases that refer to a single entity. - Uses knowledge from a curated phrase list to determine what is an auto phrase - Works on tokenized text fields (implemented as a Lucene TokenFilter) Importance of Noun Phrases Want “things” to be treated as such Pre-emptive solution for ambiguities and miss or cross matches down the road Examples “data scientist” - not “data” but a person “garbage collection” - a JVM process - has nothing to do with a search “collection” “query pipeline” vs “span query” - LW Fusion thing vs Lucene thing “query” is a noise word in LW blogs corpus, “span query” and “phrase query” are keywords
  • 16. Keyword Clustering using Facets Information Theory Keywords have high “Entropy” - meaning that their distribution is not uniform within a collection of documents, but tends to be localized to documents about a related topic. Keywords and Topics Keywords are rare within a document corpus but common within a subset of documents on the same subject area or topic Keywords used in the same subject domain will be clustered or co-located Application of Facet-Similarity Theorem Use facets on unstructured text to find terms that are co-located by computing simple facet ratios for positive and negative queries Keyword clusters can then be used for topic mapping
  • 17. Data Mining with Facets Method: • Tokenize text with auto phrasing, stop words and synonyms - store tokens in a multi-valued field with DocValues - (yes you can facet on a text field but it tends to hit a wall - 2M word limit on facet values) • Using the /terms handler, get each term in the text field. • For each term, submit two queries - one with text_field:[term] (positive Q) - one with -text_field:[term] (negative Q) • For each facet value (other terms) calculate the following ratio: • Take the X log(X) of this ratio (for better discrimination) - for each term, take the best related terms above some threshold Facet Counts (Positive Q) Total Counts (Positive Q) Facet Counts(Negative Q) Total Counts (Negative Q)
  • 18. Facet Ratios => Keyword Clusters Security ldap 727.7777777777777 permission 540.6349206349206 authentication 499.04761904761904 secure 320.22222222222223 password 231.70068027210885 identity 207.93650793650795 user name 182.984126984127 ssl 152.48677248677248 login 124.76190476190476 port 93.57142857142857 protocol 90.1058201058201 remote 77.97619047619048 connector 74.9288451012589 installation 70.88744588744588 mechanism 69.31216931216932 jetty 57.76014109347443 native 57.582417582417584 directory 43.477633477633475 sharepoint 38.98809523809524 restrict 34.65608465608466 plugin 30.114942528735632 dashboard 28.791208791208792 communicate 23.764172335600907
  • 19. Facet Ratios => Keyword Clusters garbage 1048.4615384615383 pause 813.4615384615383 heap 581.0439560439561 xx 397.6923076923077 bottleneck 325.38461538461536 collector 278.9010989010989 jvm 253.07692307692307 collect 195.23076923076923 crash 144.6153846153846 thread 135.5769230769231 scheme 116.20879120879121 concern 100.11834319526628 low 97.9652605459057 memory 91.77514792899409 slower 90.38461538461539 timestamp 75.08875739644971 log file 72.3076923076923 disk 52.36074270557029 general 46.01398601398601 generation 44.88063660477454 delete 41.26254180602007 size 38.85189437428244 efficient 38.280542986425345 specify 31.990060501296455 Garbage Collection
  • 20. Keyword Vector Document Clustering Use the Keyword Vectors to compute distances between documents rather than raw TF/IDF => Higher Signal To Noise Tokenizer Compute Keyword Vector K-Means Clustering Cluster: 98 stump_the_chump: 15159.8533727 stump: 12931.0599949 prize: 12378.4630507 sight: 2943.0123456 tough: 2872.89050924 question: 2827.60450268 judge: 2353.93441007 submit: 2250.3503055 session: 2147.89226715 panel: 1888.9584879 hostetter: 1722.90005854 grant: 1600.7415686 chump: 1558.95135161 lucene_revolution:1353.7746721 spot: 1211.58699335 award: 1048.0824900 mock: 1005.09316809 conference: 903.00251411 muir: 878.76730374 seat: 870.91541559 hot: 799.50707482
  • 21. Topic Mapping Semantic or Subject / Categorical Spaces security performance garbage collection authorization saml kerberos usernamelogin permission acl qps bottleneck latency generation jvm pause stop the world heap optimization concurrent xx speed password
  • 22. Topic Mapping <doc> <field name="label_s">Solr/Lucene Tech</field> <field name="term_ss">solr</field> <field name="term_ss">lucene</field> <field name="term_ss">search handler</field> <field name="term_ss">request handler</field> <field name="term_ss">solrj</field> <field name="term_ss">term query</field> <field name="term_ss">boolean query</field> <field name="term_ss">span query</field> <field name=“term_ss”>phrase query</field> <field name="term_ss">queryparser=>query parser</field> <field name="term_ss">fq=>filter query</field> <field name="term_ss">function query</field> <field name="term_ss">bq=>boost query</field> <field name="term_ss">solrconfig xml=>solrconfig.xml</field> <field name="term_ss">edismax</field> <field name="term_ss">dismax</field> <field name="term_ss">analysis=>analyzer</field> <field name="term_ss">positionincrementgap</field> <field name="term_ss">highlighter</field> <field name="term_ss">similarity</field> <field name="term_ss">search index,lucene index=>inverted index</field> <field name="term_ss">token</field> <field name="term_ss">token filter,tokenfilter=>tokenizer</field> <field name="term_ss">field type=>fieldtype</field> <field name="term_ss">schema.xml,schema xml=>schema</field> <field name="term_ss">facet</field> <field name="term_ss">frange=>range</field> <field name="term_ss">trie</field> <field name="term_ss">pivot faceting,facet.pivot,facet pivot=>pivot facet</field> <field name="term_ss">reference guide</field> </doc>
  • 23. Topic Mapping Approach Keyword coverage is more important than density Many-to-Many mapping Documents may cover more than one topic A given keyword may occur in more than one topic area “Democratic” process Keywords are “evidence” for a Topic - “aboutness” is cumulative Enables documents to be mapped to multiple topics - which gives information on topic relatedness Simple threshold to determine topic membership