The need for more sophisticated search implementations is often at odds with the limited feature set available in modern out of the box open source search engines.
This presentation discusses the challenges associated with properly modeling information within a domain and why it's critically needed.
The Enterprise Knowledge Graph is a disruptive platform that combines emerging Big Data and Graph technologies to reinvent knowledge management inside organizations. This platform aims to organize and distribute the organization’s knowledge, and making it centralized and universally accessible to every employee. The Enterprise Knowledge Graph is a central place to structure, simplify and connect the knowledge of an organization. By removing complexity, the knowledge graph brings more transparency, openness and simplicity into organizations. That leads to democratized communications and empowers individuals to share knowledge and to make decisions based on comprehensive knowledge. This platform can change the way we work, challenge the traditional hierarchical approach to get work done and help to unleash human potential!
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
The goal of most digital transformations is to create competitive advantage by enhancing customer experience and employee success, so giving these stakeholders the ability to find the right information at their moment of need is paramount. Employees and customers increasingly expect an intuitive, interactive experience where they can simply type or speak their questions or keywords into a search box, their intent will be understood, and the best answers and content are then immediately presented.
Providing this compelling experience, however, requires a deep understanding of your content, your unique business domain, and the collective and personalized needs of each of your users. Modern artificial intelligence (AI) approaches are able to continuously learn from both your content and the ongoing stream of user interactions with your applications, and to automatically reflect back that learned intelligence in order to instantly and scalably deliver contextually-relevant answers to employees and customers.
In this talk, we'll discuss how AI is currently being deployed across the Fortune 1000 to accomplish these goals, both in the digital workplace (helping employees more efficiently get answers and make decisions) and in digital commerce (understanding customer intent and connecting them with the best information and products). We'll separate fact from fiction as we break down the hype around AI and show how it is being practically implemented today to power many real-world digital transformations for the next generation of employees and customers.
Natural Language Search with Knowledge Graphs (Chicago Meetup)Trey Grainger
To optimally interpret most natural language queries, its important to understand a highly-nuanced, contextual interpretation of the domain-specific phrases, entities, commands, and relationships represented or implied within the search and within your domain.
In this talk, we'll walk through such a search system powered by Solr's Text Tagger and Semantic Knowledge graph. We'll have fun with some of the more search-centric use cases of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "best bbq near activate" into:
{!func}mul(min(popularity,1),100) bbq^0.91032 ribs^0.65674 brisket^0.63386 doc_type:"restaurant" {!geofilt d=50 sfield="coordinates_pt" pt="38.916120,-77.045220"}
We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding like this within your search engine.
The Next Generation of AI-powered SearchTrey Grainger
What does it really mean to deliver an "AI-powered Search" solution? In this talk, we’ll bring clarity to this topic, showing you how to marry the art of the possible with the real-world challenges involved in understanding your content, your users, and your domain. We'll dive into emerging trends in AI-powered Search, as well as many of the stumbling blocks found in even the most advanced AI and Search applications, showing how to proactively plan for and avoid them. We'll walk through the various uses of reflected intelligence and feedback loops for continuous learning from user behavioral signals and content updates, also covering the increasing importance of virtual assistants and personalized search use cases found within the intersection of traditional search and recommendation engines. Our goal will be to provide a baseline of mainstream AI-powered Search capabilities available today, and to paint a picture of what we can all expect just on the horizon.
"Searching for Meaning: The Hidden Structure in Unstructured Data". Presentation by Trey Grainger at the Southern Data Science Conference (SDSC) 2018. Covers linguistic theory, application in search and information retrieval, and knowledge graph and ontology learning methods for automatically deriving contextualized meaning from unstructured (free text) content.
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
To optimally interpret most natural language queries, it is necessary to understand the phrases, entities, commands, and relationships represented or implied within the search. Knowledge graphs serve as useful instantiations of ontologies which can help represent this kind of knowledge within a domain.
In this talk, we'll walk through techniques to build knowledge graphs automatically from your own domain-specific content, how you can update and edit the nodes and relationships, and how you can seamlessly integrate them into your search solution for enhanced query interpretation and semantic search. We'll have some fun with some of the more search-centric use cased of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "bbq near haystack" into
{ filter:["doc_type":"restaurant"], "query": { "boost": { "b": "recip(geodist(38.034780,-78.486790),1,1000,1000)", "query": "bbq OR barbeque OR barbecue" } } }
We'll also specifically cover use of the Semantic Knowledge Graph, a particularly interesting knowledge graph implementation available within Apache Solr that can be auto-generated from your own domain-specific content and which provides highly-nuanced, contextual interpretation of all of the terms, phrases and entities within your domain. We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding within your search engine.
The Enterprise Knowledge Graph is a disruptive platform that combines emerging Big Data and Graph technologies to reinvent knowledge management inside organizations. This platform aims to organize and distribute the organization’s knowledge, and making it centralized and universally accessible to every employee. The Enterprise Knowledge Graph is a central place to structure, simplify and connect the knowledge of an organization. By removing complexity, the knowledge graph brings more transparency, openness and simplicity into organizations. That leads to democratized communications and empowers individuals to share knowledge and to make decisions based on comprehensive knowledge. This platform can change the way we work, challenge the traditional hierarchical approach to get work done and help to unleash human potential!
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
The goal of most digital transformations is to create competitive advantage by enhancing customer experience and employee success, so giving these stakeholders the ability to find the right information at their moment of need is paramount. Employees and customers increasingly expect an intuitive, interactive experience where they can simply type or speak their questions or keywords into a search box, their intent will be understood, and the best answers and content are then immediately presented.
Providing this compelling experience, however, requires a deep understanding of your content, your unique business domain, and the collective and personalized needs of each of your users. Modern artificial intelligence (AI) approaches are able to continuously learn from both your content and the ongoing stream of user interactions with your applications, and to automatically reflect back that learned intelligence in order to instantly and scalably deliver contextually-relevant answers to employees and customers.
In this talk, we'll discuss how AI is currently being deployed across the Fortune 1000 to accomplish these goals, both in the digital workplace (helping employees more efficiently get answers and make decisions) and in digital commerce (understanding customer intent and connecting them with the best information and products). We'll separate fact from fiction as we break down the hype around AI and show how it is being practically implemented today to power many real-world digital transformations for the next generation of employees and customers.
Natural Language Search with Knowledge Graphs (Chicago Meetup)Trey Grainger
To optimally interpret most natural language queries, its important to understand a highly-nuanced, contextual interpretation of the domain-specific phrases, entities, commands, and relationships represented or implied within the search and within your domain.
In this talk, we'll walk through such a search system powered by Solr's Text Tagger and Semantic Knowledge graph. We'll have fun with some of the more search-centric use cases of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "best bbq near activate" into:
{!func}mul(min(popularity,1),100) bbq^0.91032 ribs^0.65674 brisket^0.63386 doc_type:"restaurant" {!geofilt d=50 sfield="coordinates_pt" pt="38.916120,-77.045220"}
We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding like this within your search engine.
The Next Generation of AI-powered SearchTrey Grainger
What does it really mean to deliver an "AI-powered Search" solution? In this talk, we’ll bring clarity to this topic, showing you how to marry the art of the possible with the real-world challenges involved in understanding your content, your users, and your domain. We'll dive into emerging trends in AI-powered Search, as well as many of the stumbling blocks found in even the most advanced AI and Search applications, showing how to proactively plan for and avoid them. We'll walk through the various uses of reflected intelligence and feedback loops for continuous learning from user behavioral signals and content updates, also covering the increasing importance of virtual assistants and personalized search use cases found within the intersection of traditional search and recommendation engines. Our goal will be to provide a baseline of mainstream AI-powered Search capabilities available today, and to paint a picture of what we can all expect just on the horizon.
"Searching for Meaning: The Hidden Structure in Unstructured Data". Presentation by Trey Grainger at the Southern Data Science Conference (SDSC) 2018. Covers linguistic theory, application in search and information retrieval, and knowledge graph and ontology learning methods for automatically deriving contextualized meaning from unstructured (free text) content.
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
To optimally interpret most natural language queries, it is necessary to understand the phrases, entities, commands, and relationships represented or implied within the search. Knowledge graphs serve as useful instantiations of ontologies which can help represent this kind of knowledge within a domain.
In this talk, we'll walk through techniques to build knowledge graphs automatically from your own domain-specific content, how you can update and edit the nodes and relationships, and how you can seamlessly integrate them into your search solution for enhanced query interpretation and semantic search. We'll have some fun with some of the more search-centric use cased of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "bbq near haystack" into
{ filter:["doc_type":"restaurant"], "query": { "boost": { "b": "recip(geodist(38.034780,-78.486790),1,1000,1000)", "query": "bbq OR barbeque OR barbecue" } } }
We'll also specifically cover use of the Semantic Knowledge Graph, a particularly interesting knowledge graph implementation available within Apache Solr that can be auto-generated from your own domain-specific content and which provides highly-nuanced, contextual interpretation of all of the terms, phrases and entities within your domain. We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding within your search engine.
Natural Language Search with Knowledge Graphs (Activate 2019)Trey Grainger
To optimally interpret most natural language queries, its important to understand a highly-nuanced, contextual interpretation of the domain-specific phrases, entities, commands, and relationships represented or implied within the search and within your domain.
In this talk, we'll walk through such a search system powered by Solr's Text Tagger and Semantic Knowledge graph. We'll have fun with some of the more search-centric use cases of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "best bbq near activate" into:
{!func}mul(min(popularity,1),100) bbq^0.91032 ribs^0.65674 brisket^0.63386 doc_type:"restaurant" {!geofilt d=50 sfield="coordinates_pt" pt="38.916120,-77.045220"}
We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding like this within your search engine.
Presentation of the Semantic Knowledge Graph research paper at the 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (Montreal, Canada - October 18th, 2016)
Abstract—This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.
Closing keynote by Trey Grainger from Activate 2018 in Montreal, Canada. Covers trends in the intersection of Search (Information Retrieval) and Artificial Intelligence, and the underlying capabilities needed to deliver those trends at scale.
Interleaving, Evaluation to Self-learning Search @904LabsJohn T. Kane
Presented at Open Source Connections Haystack Relevance Conference on 904Labs' "Interleaving: from Evaluation to Self-Learning". 904Labs is the first to commercialize "Online Learning to Rank" as a state-of-art for technical Self-learning Search Ranking that automatically takes into account your customers human behaviors for personalized search results.
South Big Data Hub: Text Data Analysis PanelTrey Grainger
Slides from Trey's opening presentation for the South Big Data Hub's Text Data Analysis Panel on December 8th, 2016. Trey provided a quick introduction to Apache Solr, described how companies are using Solr to power relevant search in industry, and provided a glimpse on where the industry is heading with regard to implementing more intelligent and relevant semantic search.
This presentation introduces text analytics, its applications and various tools/algorithms used for this process. Given below are some of the important tools:
- Decision trees
- SVM
- Naive-Bayes
- K-nearest neighbours
- Artificial Neural Networks
- Fuzzy C-Means
- Latent Dirichlet Allocation
The Apache Solr Semantic Knowledge GraphTrey Grainger
What if instead of a query returning documents, you could alternatively return other keywords most related to the query: i.e. given a search for "data science", return me back results like "machine learning", "predictive modeling", "artificial neural networks", etc.? Solr’s Semantic Knowledge Graph does just that. It leverages the inverted index to automatically model the significance of relationships between every term in the inverted index (even across multiple fields) allowing real-time traversal and ranking of any relationship within your documents. Use cases for the Semantic Knowledge Graph include disambiguation of multiple meanings of terms (does "driver" mean truck driver, printer driver, a type of golf club, etc.), searching on vectors of related keywords to form a conceptual search (versus just a text match), powering recommendation algorithms, ranking lists of keywords based upon conceptual cohesion to reduce noise, summarizing documents by extracting their most significant terms, and numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. In this talk, we'll do a deep dive into the internals of how the Semantic Knowledge Graph works and will walk you through how to get up and running with an example dataset to explore the meaningful relationships hidden within your data.
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
While traditional keyword search is still useful, pure text-based keyword matching is quickly becoming obsolete; today, it is a necessary but not sufficient tool for delivering relevant results and intelligent search experiences.
In this talk, we'll cover some of the emerging trends in AI-powered search, including the use of thought vectors (multi-level vector embeddings) and semantic knowledge graphs to contextually interpret and conceptualize queries. We'll walk through some live query interpretation demos to demonstrate the power that can be delivered through these semantic search techniques leveraging auto-generated knowledge graphs learned from your content and user interactions.
Balancing the Dimensions of User IntentTrey Grainger
The first step in returning relevant search results is successfully interpreting the user’s intent. This requires combining a holistic understanding of your content, your users, and your domain. Traditional keyword search focuses on the content understanding dimension. Knowledge graphs are then typically built and leveraged to represent an understanding of your domain. Finally, Collaborative recommendations and user profile learning are typically the tools of choice for generating and modeling an understanding of the preferences of each user.
While these systems (search, recommendations, and knowledge graphs) are often built and used in isolation, combining them together is the key to truly understanding a user’s query intent. For example, combining traditional keyword search with your knowledge graph leads to semantic search capabilities, and combining traditional keyword search with recommendations leads to personalized search experiences. Combining all of these dimensions together in an appropriately balanced way will ultimately lead to the most accurate interpretation of a user’s query, resulting in a better query to the core search engine and ultimately a better, more relevant search experience.
In this talk, we’ll demonstrate strategies for delivering and combining each of these dimensions of user intent, and we’ll walk through concrete examples of how to balance the nuances of each so that you also don’t over-personalize, over-contextualize, or under appreciate the nuances of your user’s intent.
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
Search engines frequently miss the mark when it comes to understanding user intent. This talk will describe how to overcome this by leveraging Lucene/Solr to power a knowledge graph that can extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships. For example, if a user types in (Senior Java Developer Portland, OR Hadoop), you or I know that the term “senior” designates an experience level, that “java developer” is a job title related to “software engineering”, that “portland, or” is a city with a specific geographical boundary, and that “hadoop” is a technology related to terms like “hbase”, “hive”, and “map/reduce”. Out of the box, however, most search engines just parse this query as text:((senior AND java AND developer AND portland) OR (hadoop)), which is not at all what the user intended. We will discuss how to train the search engine to parse the query into this intended understanding, and how to reflect this understanding to the end user to provide an insightful, augmented search experience. Topics: Semantic Search, Finite State Transducers, Probabilistic Parsing, Bayes Theorem, Augmented Search, Recommendations, NLP, Knowledge Graphs
Python for Data Science - Python Brasil 11 (2015)Gabriel Moreira
This talk demonstrate a complete Data Science process, involving Obtaining, Scrubbing, Exploring, Modeling and Interpreting data using Python ecosystem tools, like IPython Notebook, Pandas, Matplotlib, NumPy, SciPy and Scikit-learn.
Mike King examines the state of the SEO industry and talks through knowing information retrieval will help improve our understanding of Google. This talk debuted at MozCon
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
When your colleagues say they want Google, they don’t mean the Google Search Appliance. They mean the Google Search user experience: pervasive, expedient and delivering the information that they need. Successful enterprise search does not start with the application features, is not part of the information architecture, does not come from a controlled vocabulary and does not emerge on its own from the developers. It requires enterprise-specific data mining, enterprise-specific user-centered design and fine tuning to turn “search sucks” into search success within the firewall. This presentation looks at action items, tools and deliverables for Discovery, Planning, Design and Post Launch phases of an enterprise search deployment.
Searching the all-time growing amount of global data and research results and retrieving only the relevant and up-to date information becomes more and more challenging. The amount of data including the big data issue in the IoT world makes it even more challenging. How can an employee keeping himself up to date and include the relevant information into his work and ensure his work includes the most relevant and latest information. Most search engines today provide some sort of semantic based answers to the queries you enter into the system. However, most search engines do not know you well enough to provide you with the best answers based on who you are, and what you really want for an answer. Here is today's challenge combined with the growing amount of data and media you find it in. The answer might be closer than you think.
Natural Language Search with Knowledge Graphs (Activate 2019)Trey Grainger
To optimally interpret most natural language queries, its important to understand a highly-nuanced, contextual interpretation of the domain-specific phrases, entities, commands, and relationships represented or implied within the search and within your domain.
In this talk, we'll walk through such a search system powered by Solr's Text Tagger and Semantic Knowledge graph. We'll have fun with some of the more search-centric use cases of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "best bbq near activate" into:
{!func}mul(min(popularity,1),100) bbq^0.91032 ribs^0.65674 brisket^0.63386 doc_type:"restaurant" {!geofilt d=50 sfield="coordinates_pt" pt="38.916120,-77.045220"}
We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding like this within your search engine.
Presentation of the Semantic Knowledge Graph research paper at the 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (Montreal, Canada - October 18th, 2016)
Abstract—This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.
Closing keynote by Trey Grainger from Activate 2018 in Montreal, Canada. Covers trends in the intersection of Search (Information Retrieval) and Artificial Intelligence, and the underlying capabilities needed to deliver those trends at scale.
Interleaving, Evaluation to Self-learning Search @904LabsJohn T. Kane
Presented at Open Source Connections Haystack Relevance Conference on 904Labs' "Interleaving: from Evaluation to Self-Learning". 904Labs is the first to commercialize "Online Learning to Rank" as a state-of-art for technical Self-learning Search Ranking that automatically takes into account your customers human behaviors for personalized search results.
South Big Data Hub: Text Data Analysis PanelTrey Grainger
Slides from Trey's opening presentation for the South Big Data Hub's Text Data Analysis Panel on December 8th, 2016. Trey provided a quick introduction to Apache Solr, described how companies are using Solr to power relevant search in industry, and provided a glimpse on where the industry is heading with regard to implementing more intelligent and relevant semantic search.
This presentation introduces text analytics, its applications and various tools/algorithms used for this process. Given below are some of the important tools:
- Decision trees
- SVM
- Naive-Bayes
- K-nearest neighbours
- Artificial Neural Networks
- Fuzzy C-Means
- Latent Dirichlet Allocation
The Apache Solr Semantic Knowledge GraphTrey Grainger
What if instead of a query returning documents, you could alternatively return other keywords most related to the query: i.e. given a search for "data science", return me back results like "machine learning", "predictive modeling", "artificial neural networks", etc.? Solr’s Semantic Knowledge Graph does just that. It leverages the inverted index to automatically model the significance of relationships between every term in the inverted index (even across multiple fields) allowing real-time traversal and ranking of any relationship within your documents. Use cases for the Semantic Knowledge Graph include disambiguation of multiple meanings of terms (does "driver" mean truck driver, printer driver, a type of golf club, etc.), searching on vectors of related keywords to form a conceptual search (versus just a text match), powering recommendation algorithms, ranking lists of keywords based upon conceptual cohesion to reduce noise, summarizing documents by extracting their most significant terms, and numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. In this talk, we'll do a deep dive into the internals of how the Semantic Knowledge Graph works and will walk you through how to get up and running with an example dataset to explore the meaningful relationships hidden within your data.
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
While traditional keyword search is still useful, pure text-based keyword matching is quickly becoming obsolete; today, it is a necessary but not sufficient tool for delivering relevant results and intelligent search experiences.
In this talk, we'll cover some of the emerging trends in AI-powered search, including the use of thought vectors (multi-level vector embeddings) and semantic knowledge graphs to contextually interpret and conceptualize queries. We'll walk through some live query interpretation demos to demonstrate the power that can be delivered through these semantic search techniques leveraging auto-generated knowledge graphs learned from your content and user interactions.
Balancing the Dimensions of User IntentTrey Grainger
The first step in returning relevant search results is successfully interpreting the user’s intent. This requires combining a holistic understanding of your content, your users, and your domain. Traditional keyword search focuses on the content understanding dimension. Knowledge graphs are then typically built and leveraged to represent an understanding of your domain. Finally, Collaborative recommendations and user profile learning are typically the tools of choice for generating and modeling an understanding of the preferences of each user.
While these systems (search, recommendations, and knowledge graphs) are often built and used in isolation, combining them together is the key to truly understanding a user’s query intent. For example, combining traditional keyword search with your knowledge graph leads to semantic search capabilities, and combining traditional keyword search with recommendations leads to personalized search experiences. Combining all of these dimensions together in an appropriately balanced way will ultimately lead to the most accurate interpretation of a user’s query, resulting in a better query to the core search engine and ultimately a better, more relevant search experience.
In this talk, we’ll demonstrate strategies for delivering and combining each of these dimensions of user intent, and we’ll walk through concrete examples of how to balance the nuances of each so that you also don’t over-personalize, over-contextualize, or under appreciate the nuances of your user’s intent.
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
Search engines frequently miss the mark when it comes to understanding user intent. This talk will describe how to overcome this by leveraging Lucene/Solr to power a knowledge graph that can extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships. For example, if a user types in (Senior Java Developer Portland, OR Hadoop), you or I know that the term “senior” designates an experience level, that “java developer” is a job title related to “software engineering”, that “portland, or” is a city with a specific geographical boundary, and that “hadoop” is a technology related to terms like “hbase”, “hive”, and “map/reduce”. Out of the box, however, most search engines just parse this query as text:((senior AND java AND developer AND portland) OR (hadoop)), which is not at all what the user intended. We will discuss how to train the search engine to parse the query into this intended understanding, and how to reflect this understanding to the end user to provide an insightful, augmented search experience. Topics: Semantic Search, Finite State Transducers, Probabilistic Parsing, Bayes Theorem, Augmented Search, Recommendations, NLP, Knowledge Graphs
Python for Data Science - Python Brasil 11 (2015)Gabriel Moreira
This talk demonstrate a complete Data Science process, involving Obtaining, Scrubbing, Exploring, Modeling and Interpreting data using Python ecosystem tools, like IPython Notebook, Pandas, Matplotlib, NumPy, SciPy and Scikit-learn.
Mike King examines the state of the SEO industry and talks through knowing information retrieval will help improve our understanding of Google. This talk debuted at MozCon
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
When your colleagues say they want Google, they don’t mean the Google Search Appliance. They mean the Google Search user experience: pervasive, expedient and delivering the information that they need. Successful enterprise search does not start with the application features, is not part of the information architecture, does not come from a controlled vocabulary and does not emerge on its own from the developers. It requires enterprise-specific data mining, enterprise-specific user-centered design and fine tuning to turn “search sucks” into search success within the firewall. This presentation looks at action items, tools and deliverables for Discovery, Planning, Design and Post Launch phases of an enterprise search deployment.
Searching the all-time growing amount of global data and research results and retrieving only the relevant and up-to date information becomes more and more challenging. The amount of data including the big data issue in the IoT world makes it even more challenging. How can an employee keeping himself up to date and include the relevant information into his work and ensure his work includes the most relevant and latest information. Most search engines today provide some sort of semantic based answers to the queries you enter into the system. However, most search engines do not know you well enough to provide you with the best answers based on who you are, and what you really want for an answer. Here is today's challenge combined with the growing amount of data and media you find it in. The answer might be closer than you think.
This is a presentation I delivered at Enterprise Data World 2018 to make the case for developing intelligent systems using a hybrid or blended approach combining statistical-based machine learning with knowledge-based approaches that involve ontologies, taxonomies or knowledge graphs.
his talk will feature some of my recent research into the alternative uses for Solr facets and facet metadata. I will develop the idea that facets can be used to discover similarities between items and attributes in a search index, and show some interesting applications of this idea. A common takeaway is that using facets and facet metadata in non-conventional ways enables the semantic context of a query to be automatically tuned. This has important implications for user-centric and semantically focused relevance.
professional fuzzy type-ahead rummage around in xml type-ahead search techni...Kumar Goud
Abstract – It is a research venture on the new information-access standard called type-ahead search, in which systems discover responds to a keyword query on-the-fly as users type in the uncertainty. In this paper we learn how to support fuzzy type-ahead search in XML. Underneath fuzzy search is important when users have limited knowledge about the exact representation of the entities they are looking for, such as people records in an online directory. We have developed and deployed several such systems, some of which have been used by many people on a daily basis. The systems received overwhelmingly positive feedbacks from users due to their friendly interfaces with the fuzzy-search feature. We describe the design and implementation of the systems, and demonstrate several such systems. We show that our efficient techniques can indeed allow this search paradigm to scale on large amounts of data.
Index Terms - type-ahead, large data set, server side, online directory, search technique.
I was invited to speak at OMCap Berlin 2014 about the close relationship between search engines and user experience with prescriptive guidance to gain higher rankings and more conversions.
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Gabriel Moreira
This talk introduces the main techniques of Recommender Systems and Topic Modeling. Then, we present a case of how we've combined those techniques to build Smart Canvas, a SaaS that allows people to bring, create and curate content relevant to their organization, and also helps to tear down knowledge silos.
We give a deep dive into the design of our large-scale recommendation algorithms, giving special attention to a content-based approach that uses topic modeling techniques (like LDA and NMF) to discover people’s topics of interest from unstructured text, and social-based algorithms using a graph database connecting content, people and teams around topics.
Our typical data pipeline that includes the ingestion millions of user events (using Google PubSub and BigQuery), the batch processing of the models (with PySpark, MLib, and Scikit-learn), the online recommendations (with Google App Engine, Titan Graph Database and Elasticsearch), and the data-driven evaluation of UX and algorithms through A/B testing experimentation. We also touch topics about non-functional requirements of a software-as-a-service like scalability, performance, availability, reliability and multi-tenancy and how we addressed it in a robust architecture deployed on Google Cloud Platform.
Short-Bio: Gabriel Moreira is a scientist passionate about solving problems with data. He is Head of Machine Learning at CI&T and Doctoral student at Instituto Tecnológico de Aeronáutica - ITA. where he has also got his Masters on Science. His current research interests are recommender systems and deep learning.
https://www.meetup.com/pt-BR/machine-learning-big-data-engenharia/events/239037949/
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Launch Your Streaming Platforms in MinutesRoshan Dwivedi
The claim of launching a streaming platform in minutes might be a bit of an exaggeration, but there are services that can significantly streamline the process. Here's a breakdown:
Pros of Speedy Streaming Platform Launch Services:
No coding required: These services often use drag-and-drop interfaces or pre-built templates, eliminating the need for programming knowledge.
Faster setup: Compared to building from scratch, these platforms can get you up and running much quicker.
All-in-one solutions: Many services offer features like content management systems (CMS), video players, and monetization tools, reducing the need for multiple integrations.
Things to Consider:
Limited customization: These platforms may offer less flexibility in design and functionality compared to custom-built solutions.
Scalability: As your audience grows, you might need to upgrade to a more robust platform or encounter limitations with the "quick launch" option.
Features: Carefully evaluate which features are included and if they meet your specific needs (e.g., live streaming, subscription options).
Examples of Services for Launching Streaming Platforms:
Muvi [muvi com]
Uscreen [usencreen tv]
Alternatives to Consider:
Existing Streaming platforms: Platforms like YouTube or Twitch might be suitable for basic streaming needs, though monetization options might be limited.
Custom Development: While more time-consuming, custom development offers the most control and flexibility for your platform.
Overall, launching a streaming platform in minutes might not be entirely realistic, but these services can significantly speed up the process compared to building from scratch. Carefully consider your needs and budget when choosing the best option for you.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaYara Milbes
Discover the transformative power of the WhatsApp API in our latest SlideShare presentation, "Top 7 Unique WhatsApp API Benefits." In today's fast-paced digital era, effective communication is crucial for both personal and professional success. Whether you're a small business looking to enhance customer interactions or an individual seeking seamless communication with loved ones, the WhatsApp API offers robust capabilities that can significantly elevate your experience.
In this presentation, we delve into the top 7 distinctive benefits of the WhatsApp API, provided by the leading WhatsApp API service provider in Saudi Arabia. Learn how to streamline customer support, automate notifications, leverage rich media messaging, run scalable marketing campaigns, integrate secure payments, synchronize with CRM systems, and ensure enhanced security and privacy.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
5. If wealth is knowledge, knowledge about
our domain, and the knowledge to model
it accurately could be said to be “value”.
”Value” is the proposition that drives
users to engage with search
6. In computer graphics, 3d models are simplified for real-
time applications (video games).
Fidelity is preserved by applying a high-fidelity proxy to
the lower-fidelity “real-time” representation.
This process is called “baking”.
7. In machine learning, when we ‘train’ a model, we are
‘baking’ knowledge into a more efficient representation.
The same is true for how we might enhance searches
by using external datasets, query statistics, LTR, etc.
Modeling a high-fidelity representation of data into a
real-time, more efficient form is key to climbing the
ladder of search sophistication.
8. Representing domain knowledge within our search
platform so that it provides value to our users is how
we achieve sophistication.
This is perhaps the greatest challenge in building
search products.
9. Our premise
Intent IS accuracy, recall IS relevancy
This may be controversial; recall vs accuracy is the wrong juxtaposition.
10. Our premise
Perhaps the best way this relationship can be described is:
- The fidelity of a domain model impacts recall
- Accuracy is linked to our domain model
- Relevancy is linked to accuracy
- Accuracy is best modeled by understanding intent
- Restrictive queries shouldn’t be presumed to be accurate.
Accuracy exists independent of the percent of documents matched.
11. Our premise
If accuracy is the ultimate goal, and ‘recall’ is apart of accuracy, how do we
go about achieving this?
14. Modeling Knowledge
Let’s return to discussing sophistication. Before we made the claim that knowledge is
what provides value. We also said that modeling knowledge is difficult.
Implementing maturity in our search platform is what allows us to model our domain
knowledge.
16. Modeling Knowledge
Observation… It’s really hard for most organizations to climb the sophistication ladder
that was shown in the previous slide.
17. Out of the box
Scorer (default similarity)
Query Handler (Edismax)
Import Handlers
Analyzers / TokenFilters
Boost Functions
18. What we need
Query Classifiers
ML Models
Behavior Sampling / Ingestion
Identity Awareness
Secondary Data Sources (data connectors)
Alternative forms of storage (inverted index)
Integrations (Spark, Airflow, etc)
Collections as “Containers” for behavior.
19. Modeling Knowledge
When we model our domain we want to model “things”, that we
can call “entities”.
Modeling entities in any domain can be extremely valuable.
21. Modeling Knowledge - Entities
1.) disambiguation for free
2.) fairly easy to generate candidates for any domain
22. Modeling Knowledge - Entities
1.) disambiguation for free
2.) fairly easy to generate candidates for any domain
3.) fairly well researched area of ML
23. Modeling Knowledge - Entities
1.) disambiguation for free
2.) fairly easy to generate candidates for any domain
3.) fairly well researched area of ML
4.) helps in the modeling of “conceptual” synonyms
24. Modeling Knowledge - Entities
1.) disambiguation for free
2.) fairly easy to generate candidates for any domain
3.) fairly well researched area of ML
4.) helps in the modeling of “conceptual” synonyms
5.) must be pruned by user feedback / behavior
25. Modeling Knowledge - Entities
1.) disambiguation for free
2.) fairly easy to generate candidates for any domain
3.) fairly well researched area of ML
4.) helps in the modeling of “conceptual” synonyms
5.) must be pruned by user feedback / behavior
6.) ground work for higher-level more sophisticated features.
28. Modeling Knowledge - Entities
Ok, but why ?
In the previous slide we saw that 40% of Target Corporations searches are low-information, and they
don’t know what they mean. Without modeling your corpus (the content you are searching) you
won’t be able to reason about the behavior or relationship between searches, actions, and ultimately
intent.
It is extremely common for a good portion of searches (half) to not provided the necessary
information to give relevant term-based search results.
This is at the core of the case for sophistication. Term search simply can’t provide useful results for
a large number of searches that your users are going to perform.
30. Modeling Knowledge – Truth Systems
entity feature value
plato isA philosopher
socrates isA philosopher
plato knew socrates
socrates knew plato
plato isA historical-figure
socrates isA historical-figure
31. Modeling Knowledge - Similarity
Socrates != Plato
- Related, but not the same
- One is not a subset of the other
- Found in many of the same documents
- Found in many of the same contexts
- This is where automatic similarity methods, fall down a bit.
32. Modeling Knowledge - Ontologies
Entities and ontologies can work together...
- when building ontologies there are different types of relationships.
- word2vec / phrase2vec, LSA, cannot be used by themselves.
- ontologies can be pruned and reshaped by supervised learning.
- ontologies can be reshaped by feature-systems (truth systems).
- most useful ontologies are modeled for a specific feature (product titles).
- query classifier can choose between similarity features / models.
34. Modeling Knowledge
We can’t simply rely on our corpus to provide us with the information necessary to model
our domain. We must use auxiliary data sources.
Fortunately there are many open data sources in the world that we can use to augment
our understanding of our corpus.
36. Modeling Knowledge - Entities
Entities and ontologies can work together...
- when building ontologies there are different types of relationships.
- word2vec / phrase2vec cannot be used by themselves
- ontologies can be pruned and reshaped by supervised learning
- ontologies can be reshaped by feature-systems (truth systems)
38. In the previous slides we saw entity mapping and grading of a job-search domain model. This
was accomplished by building candidate phrases and then pruning them using an SVM trained
from features from a known good data source with phrases and topics already labeled.
Also shown was a query classifier that takes a lazy or poorly constructed query, groups the
components of the query logically and expands part of the query based on information it knows
about the index and availability and relatedness of terms.
A model to classify queries can be built by understanding the relationship between search
entities, and the entities and information contained within a document.
39. SHReC is a Java package implementing a hierarchical document clustering algorithm based on a
statistical co-occurence measure called subsumption.
The algorithm is particularly suited to the problem of on-line "search results" clustering, requiring little
amounts of text data. - http://shrec.sourceforge.net/
Search Action Document
SHReC along with an entity model can be used to prune, grade, and reorganize an ontology to better
understand the types and accuracy of relationships. Algorithms used to cluster behavior with search
terms are invaluable in modeling search intent and rewriting search queries.
40. The perfect combination of phrase
boosting, multi-term synonyms, term
position (proximity) and performance is
a frequent question within the
community.
41. Exact Phrase Matches → PhraseQuery / SpanQuery
Proximity of Terms → SpanQuery
Related Phrases → Payloads / Index Time Synonyms
42. Currently in Solr there is no built-in way to represent related entities efficiently. Query rewriting or
expansion can be performed at query time, but not all relationships can be modeled at query time
due to the complexity of the query.
Different classifications of synonym within the index are an option, as well as payloads being used to
assign relatedness scores to a given entity.
All index-side synonym solutions are quite custom and are not easy to quickly implement.
Better tools are needed to correctly model graphs of terms or entities and to create rules for how and
when to rewrite search queries without using crude rule based systems.
44. Conclusion
- Modeling the world through language is hard.
- Modeling phrases and entities makes life a little easier.
45. Conclusion
- Modeling the world through language is hard.
- Modeling phrases and entities makes life a little easier.
- Phrases form the basis of relationships.
46. Conclusion
- Modeling the world through language is hard.
- Modeling phrases and entities makes life a little easier.
- Phrases form the basis of relationships.
- Accuracy should be proportional to confidence
Editor's Notes
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
To have “expanded” recall we must model our domain, to do this we need entities and to understand the relationship between them.
To have “expanded” recall we must model our domain, to do this we need entities and to understand the relationship between them.
Intent is domain specific, so we want to find easier ways to model it
When we model intent we want to do all the things you do with search, test it, debug it, update it, reinforce it with judgements.
To have “expanded” recall we must model our domain, to do this we need entities and to understand the relationship between them.
To have “expanded” recall we must model our domain, to do this we need entities and to understand the relationship between them.
To have “expanded” recall we must model our domain, to do this we need entities and to understand the relationship between them.
To have “expanded” recall we must model our domain, to do this we need entities and to understand the relationship between them.
Solr, Elastic Search provide good primitives out of the box
Demands of modern search applications require more layered and sophisticated primitives
So I’ve talked about modeling entities, but why do we need to do this… can’t be get most of the way there with traditional search, hasn’t it worked fine for most people until now?
This is a slide from Target corp presentation
A lot of their searches are long-tail with poor matches
Also, when their model can’t match both words within a given category, they fall-back to search for only 1 term that has the most matches.
This is a great example where understanding the relationship between searches and entities can be very important.
So I’ve talked about modeling entities, but why do we need to do this… can’t be get most of the way there with traditional search, hasn’t it worked fine for most people until now?
Single term searches are a huge issue in job search, and interpreting what they mean is a challenge.
Modeling entities in our domain helps us begin to understand the relationship between term searches and the types of documents viewed.
Entity approaches can also help us understand the individual searchers affinity within ambiguous contexts.
You can imagine a search system in which we are modeling people. This might be useful for a library or research system.
Which brings us to traditional challenges with similarity and where it can go off the rails.
Ontologies are a huge subject, but really what we are describing is a graph database with edges that are informed based on what we know about a particular entity.
One entity may be perfectly related to another.
If we were building a library data-system we might have an ontology of historical persons.
The ontology might form features to tell us if the person was an inventor or politician.
Ontologies are a huge subject, but really what we are describing is a graph database with edges that are informed based on what we know about a particular entity.
One entity may be perfectly related to another.
If we were building a library data-system we might have an ontology of historical persons.
The ontology might form features to tell us if the person was an inventor or politician.
A simple ontology can be constructed for job titles from query logs and reviewed / pruned by hand.
Supervised ML approaches can get you pretty close to this as well.
Conceptual relationships from phrases are easier to model than words or simpler language
- accuracy or broadness of a search query should be related to how well we understand what’s being searched for
- In the absence of high confidence search should fall back to a default algorithm.