The document discusses various approaches to AI-powered search, including content understanding through keyword search, user understanding through collaborative recommendations, and combining the two through personalized search. It then covers domain understanding using knowledge graphs, combining domain and user understanding through domain-aware matching, and combining content and domain understanding through semantic search. Finally, it discusses balancing keyword, vector, and knowledge graph search approaches.
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
While traditional keyword search is still useful, pure text-based keyword matching is quickly becoming obsolete; today, it is a necessary but not sufficient tool for delivering relevant results and intelligent search experiences.
In this talk, we'll cover some of the emerging trends in AI-powered search, including the use of thought vectors (multi-level vector embeddings) and semantic knowledge graphs to contextually interpret and conceptualize queries. We'll walk through some live query interpretation demos to demonstrate the power that can be delivered through these semantic search techniques leveraging auto-generated knowledge graphs learned from your content and user interactions.
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
Search engines frequently miss the mark when it comes to understanding user intent. This talk will walk through some of the key building blocks necessary to turn a search engine into a dynamically-learning "intent engine", able to interpret and search on meaning, not just keywords. We will walk through CareerBuilder's semantic search architecture, including semantic autocomplete, query and document interpretation, probabilistic query parsing, automatic taxonomy discovery, keyword disambiguation, and personalization based upon user context/behavior. We will also see how to leverage an inverted index (Lucene/Solr) as a knowledge graph that can be used as a dynamic ontology to extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships.
As an example, most search engines completely miss the mark at parsing a query like (Senior Java Developer Portland, OR Hadoop). We will show how to dynamically understand that "senior" designates an experience level, that "java developer" is a job title related to "software engineering", that "portland, or" is a city with a specific geographical boundary (as opposed to a keyword followed by a boolean operator), and that "hadoop" is the skill "Apache Hadoop", which is also related to other terms like "hbase", "hive", and "map/reduce". We will discuss how to train the search engine to parse the query into this intended understanding and how to reflect this understanding to the end user to provide an insightful, augmented search experience.
Topics: Semantic Search, Apache Solr, Finite State Transducers, Probabilistic Query Parsing, Bayes Theorem, Augmented Search, Recommendations, Query Disambiguation, NLP, Knowledge Graphs
The Python Cheat Sheet for the Busy MarketerHamlet Batista
What percentage of an Inbound marketer's day doesn't involve working with spreadsheets? How much of this work is time-consuming and repetitive? In this interactive session, you will learn how to manipulate Google Sheets to automate common data analysis workflows using Python, a very easy to use programming language.
Minority Log Report (Analisis de LOGS para SEO) - ESHOW [CLINIC SEO 2018]Luis M Villanueva
Charla donde se explica de 0 a 100 como analizar LOGS de cualquier tipo de página y sacar los datos realmente útiles para SEO. Desde analisis de Webs medianas con Excel, power query y power pivot y apoyado en Screaming Frog Log analyser, hasta jugar con una herramienta propia donde jugamos con bases de datos de logs, analytics y Search console con el fin de cruzar los datos y analizar los logs de Webs GRANDES para mejorar su SEO.
¡LOS LOGS ELIMINAN CONJETURAS!
Lexical Semantics, Semantic Similarity and Relevance for SEOKoray Tugberk GUBUR
Lexical semantics and relations between words include relations of superiority, inferiority, part, whole, opposition, and sameness between the meanings of words. The same word can be a meronymy, hyponym, or antonym of another word, depending on the word before or after it. The lexical relation value of the first word can affect the structure of the next word, affecting the context of the sentence and the Information Retrieval Score. Information Retrieval Score is the score that determines how much content is related to a query, how close the different variants of the related query are, and the structure processed by the search engine’s query processor to the relevant document. A higher information retrieval score represents better relevance and possible click satisfaction.
The problem with a semi-structured and distracting context for Information Retrieval Score is that, if a document is not configured for a single topic, the IR Score can be diluted by the two different contexts resulting in a relative rank lost to another textual document.
IR Score Dilution involves badly structured lexical relations, along with bad word proximity. The relevant words that complete each other within the meaning map should be used closely, within a paragraph or section of the document, to signal the context in a more clear way to increase the IR Score. A search engine can check whether the document contains the hyponym of the words within the query or not. A possible query prediction can be generated from the hypernyms of the query. A search engine can check only the anchor texts to see whether there is a word within the “hyponym distance” which represents the hyponym depth between two different words.
Lexical Relations can represent the semantic annotations for a document. A semantic annotation is a word that describes the document overall in terms of category and main context that carries the purpose of the document. A semantic annotation can contain the main entity of the document or a general concept for covering a broader meaning area (knowledge domain). Semantic Annotations can be generated with the lexical relations between words. A semantic annotation can be used to match the document to the query. Semantic annotations are factors for a better IR Score.
A search engine can generate phrase patterns from the lexical relationships between words within the queries or the documents. A phrase pattern contains sections that define a concept with qualifiers. Phrase patterns can contain a hyponym just after an adjective, or a hypernym with the antonym of the same adjective. Most of these connections and patterns are used within the Recurrent Neural Network (RNN) for the next word prediction. A phrase pattern helps a search engine to increase its confidence score for relating the document to the specific query, or the meaning of the query.
Building a semantic search system - one that can correctly parse and interpret end-user intent and return the ideal results for users’ queries - is not an easy task. It requires semantically parsing the terms, phrases, and structure within queries, disambiguating polysemous terms, correcting misspellings, expanding to conceptually synonymous or related concepts, and rewriting queries in a way that maps the correct interpretation of each end user’s query into the ideal representation of features and weights that will return the best results for that user. Not only that, but the above must often be done within the confines of a very specific domain - ripe with its own jargon and linguistic and conceptual nuances.
This talk will walk through the anatomy of a semantic search system and how each of the pieces described above fit together to deliver a final solution. We'll leverage several recently-released capabilities in Apache Solr (the Semantic Knowledge Graph, Solr Text Tagger, Statistical Phrase Identifier) and Lucidworks Fusion (query log mining, misspelling job, word2vec job, query pipelines, relevancy experiment backtesting) to show you an end-to-end working Semantic Search system that can automatically learn the nuances of any domain and deliver a substantially more relevant search experience.
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
While traditional keyword search is still useful, pure text-based keyword matching is quickly becoming obsolete; today, it is a necessary but not sufficient tool for delivering relevant results and intelligent search experiences.
In this talk, we'll cover some of the emerging trends in AI-powered search, including the use of thought vectors (multi-level vector embeddings) and semantic knowledge graphs to contextually interpret and conceptualize queries. We'll walk through some live query interpretation demos to demonstrate the power that can be delivered through these semantic search techniques leveraging auto-generated knowledge graphs learned from your content and user interactions.
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
Search engines frequently miss the mark when it comes to understanding user intent. This talk will walk through some of the key building blocks necessary to turn a search engine into a dynamically-learning "intent engine", able to interpret and search on meaning, not just keywords. We will walk through CareerBuilder's semantic search architecture, including semantic autocomplete, query and document interpretation, probabilistic query parsing, automatic taxonomy discovery, keyword disambiguation, and personalization based upon user context/behavior. We will also see how to leverage an inverted index (Lucene/Solr) as a knowledge graph that can be used as a dynamic ontology to extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships.
As an example, most search engines completely miss the mark at parsing a query like (Senior Java Developer Portland, OR Hadoop). We will show how to dynamically understand that "senior" designates an experience level, that "java developer" is a job title related to "software engineering", that "portland, or" is a city with a specific geographical boundary (as opposed to a keyword followed by a boolean operator), and that "hadoop" is the skill "Apache Hadoop", which is also related to other terms like "hbase", "hive", and "map/reduce". We will discuss how to train the search engine to parse the query into this intended understanding and how to reflect this understanding to the end user to provide an insightful, augmented search experience.
Topics: Semantic Search, Apache Solr, Finite State Transducers, Probabilistic Query Parsing, Bayes Theorem, Augmented Search, Recommendations, Query Disambiguation, NLP, Knowledge Graphs
The Python Cheat Sheet for the Busy MarketerHamlet Batista
What percentage of an Inbound marketer's day doesn't involve working with spreadsheets? How much of this work is time-consuming and repetitive? In this interactive session, you will learn how to manipulate Google Sheets to automate common data analysis workflows using Python, a very easy to use programming language.
Minority Log Report (Analisis de LOGS para SEO) - ESHOW [CLINIC SEO 2018]Luis M Villanueva
Charla donde se explica de 0 a 100 como analizar LOGS de cualquier tipo de página y sacar los datos realmente útiles para SEO. Desde analisis de Webs medianas con Excel, power query y power pivot y apoyado en Screaming Frog Log analyser, hasta jugar con una herramienta propia donde jugamos con bases de datos de logs, analytics y Search console con el fin de cruzar los datos y analizar los logs de Webs GRANDES para mejorar su SEO.
¡LOS LOGS ELIMINAN CONJETURAS!
Lexical Semantics, Semantic Similarity and Relevance for SEOKoray Tugberk GUBUR
Lexical semantics and relations between words include relations of superiority, inferiority, part, whole, opposition, and sameness between the meanings of words. The same word can be a meronymy, hyponym, or antonym of another word, depending on the word before or after it. The lexical relation value of the first word can affect the structure of the next word, affecting the context of the sentence and the Information Retrieval Score. Information Retrieval Score is the score that determines how much content is related to a query, how close the different variants of the related query are, and the structure processed by the search engine’s query processor to the relevant document. A higher information retrieval score represents better relevance and possible click satisfaction.
The problem with a semi-structured and distracting context for Information Retrieval Score is that, if a document is not configured for a single topic, the IR Score can be diluted by the two different contexts resulting in a relative rank lost to another textual document.
IR Score Dilution involves badly structured lexical relations, along with bad word proximity. The relevant words that complete each other within the meaning map should be used closely, within a paragraph or section of the document, to signal the context in a more clear way to increase the IR Score. A search engine can check whether the document contains the hyponym of the words within the query or not. A possible query prediction can be generated from the hypernyms of the query. A search engine can check only the anchor texts to see whether there is a word within the “hyponym distance” which represents the hyponym depth between two different words.
Lexical Relations can represent the semantic annotations for a document. A semantic annotation is a word that describes the document overall in terms of category and main context that carries the purpose of the document. A semantic annotation can contain the main entity of the document or a general concept for covering a broader meaning area (knowledge domain). Semantic Annotations can be generated with the lexical relations between words. A semantic annotation can be used to match the document to the query. Semantic annotations are factors for a better IR Score.
A search engine can generate phrase patterns from the lexical relationships between words within the queries or the documents. A phrase pattern contains sections that define a concept with qualifiers. Phrase patterns can contain a hyponym just after an adjective, or a hypernym with the antonym of the same adjective. Most of these connections and patterns are used within the Recurrent Neural Network (RNN) for the next word prediction. A phrase pattern helps a search engine to increase its confidence score for relating the document to the specific query, or the meaning of the query.
Building a semantic search system - one that can correctly parse and interpret end-user intent and return the ideal results for users’ queries - is not an easy task. It requires semantically parsing the terms, phrases, and structure within queries, disambiguating polysemous terms, correcting misspellings, expanding to conceptually synonymous or related concepts, and rewriting queries in a way that maps the correct interpretation of each end user’s query into the ideal representation of features and weights that will return the best results for that user. Not only that, but the above must often be done within the confines of a very specific domain - ripe with its own jargon and linguistic and conceptual nuances.
This talk will walk through the anatomy of a semantic search system and how each of the pieces described above fit together to deliver a final solution. We'll leverage several recently-released capabilities in Apache Solr (the Semantic Knowledge Graph, Solr Text Tagger, Statistical Phrase Identifier) and Lucidworks Fusion (query log mining, misspelling job, word2vec job, query pipelines, relevancy experiment backtesting) to show you an end-to-end working Semantic Search system that can automatically learn the nuances of any domain and deliver a substantially more relevant search experience.
A beginner's guide to machine learning for SEOs - WTSFest 2022LazarinaStoyanova
This is a guide for machine learning for beginners, tailored to the SEO industry, aimed at breaking down the challenges that hold us back from experimenting, the breakdown of machine learning's main characteristics to help us understand how to implement it a bit better, and the ways we can embed advanced technology into our daily practice.
elasticsearch의 기본적인 working에 대한 발표자료입니다.
특히나 logging보다는 '검색 서비스'에 포커싱된 자료이기 때문에 '한글검색' 으로 고통받으실 분들을 위한 기초 자료라 생각해주시면 감사하겠습니다.
맞지않는 정보와 오탈자 그리고 의문점이 든다면 dydwls121200@gmail.com으로 언제든지 가벼운 마음으로 메일주세요. 저 또한 성장시키는 일이기도 하니까요. 환영합니다.
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...LazarinaStoyanova
How are you currently doing SERP analysis? Is your approach efficient or scalable?
How SEOs can incorporate programmatic approaches and advances in machine learning in order to identify winning strategies?
This talk will leave you with a better understanding of what is possible for SERP analysis at scale, what insights you can capture with the help of machine learning quickly, and how to incorporate insights into your strategy and visualize your findings to impress your stakeholders.
Building a Search Intent-Driven Website Architecture (SEO Mastery Summit 2022...LazarinaStoyanova
In this talk, given for SEO Mastery Summit 2022, I:
- propose a continuous improvement planning approach for web architecture
- highlight the importance of goal-driven site architecture planning
- highlight the relationship between site goals, user personas, and search intent
- demonstrate ways of considering search intent in both page planning and structure planning
- highlight the benefits of having an intent-driven website architecture
- highlight different ways to troubleshoot for intent mismatch
- recommend some further reading to get set you on a path of success ✨
If you want to read the talk write-up, head over to: https://lazarinastoy.com/building-a-search-intent-driven-website-architecture/
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...LazarinaStoyanova
In this presentation, I go through the different use cases of machine learning APIs for search marketers and digital marketers. Specifically, I look at APIs by Google Cloud and OpenAI and identify the best to use for your SEO projects, based on the task at hand.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Whilst passage indexing may seem like a small tweak to search ranking, it is potentially much more symptomatic of the beginning of a fundamental shift in the way that search engines understand unstructured content, determine relevance in natural language, and rank efficiently and effectively.
It could also be a means of assessing overall quality of content and a means of dynamic index pruning. We will look at the landscape, and also provide some takeaways for brands and business owners looking to improve quality in unstructured content overall in this fast changing landscape.
In this talk, Beth will give an introduction of what schema is, where it sits within a structured data framework, and its use cases. Then she’ll move on to the types that can be used or are no longer recognised by search engines. This will be followed by a hands-on discussion of how to audit a website, how to write the relevant code, and ways to implement and test.
The Reason Behind Semantic SEO: Why does Google Avoid the Word PageRank?Koray Tugberk GUBUR
This article delves into the concepts of Semantic SEO, Topical Authority, and PageRank, exploring their relationships and how they benefit both website owners and search engines. By leveraging Natural Language Processing (NLP) techniques, Semantic SEO improves search engine comprehension of content and enhances user experience, ultimately leading to better search results.
In the ever-evolving world of Search Engine Optimization (SEO), understanding the intricate connections between Semantic SEO, Topical Authority, and PageRank is crucial for webmasters, content creators, and marketers. These concepts play a vital role in enhancing the visibility and relevance of websites in search results.
Semantic SEO: Going Beyond Keywords
Semantic SEO involves optimizing content by focusing on the meaning and context of words, phrases, and sentences rather than merely targeting specific keywords. This is achieved through NLP techniques such as topic modeling, sentiment analysis, and entity recognition, which allow search engines to comprehend the true essence of content.
Topical Authority: Establishing Expertise and Trustworthiness
Topical Authority refers to the perceived expertise of a website or content creator in a specific subject area. By producing high-quality, relevant, and in-depth content, websites can establish themselves as authorities, earning the trust of both users and search engines. This translates into higher search rankings and increased visibility.
PageRank: Measuring the Importance of Webpages
PageRank is an algorithm used by Google to determine the significance of a webpage by analyzing the quality and quantity of its inbound links. A higher PageRank implies that a website is more authoritative and valuable, thus warranting a better position in search results.
The Interrelation of Semantic SEO, Topical Authority, and PageRank
Semantic SEO, Topical Authority, and PageRank are interconnected concepts that work in tandem to improve a website's search performance. By focusing on Semantic SEO, content creators can enhance their Topical Authority and establish a solid online presence. This, in turn, can lead to higher PageRank and improved search visibility.
The Benefits of Semantic SEO for Search Engines
Semantic SEO not only benefits website owners but also search engines by reducing the cost of understanding documents. With the help of NLP techniques, search engines can efficiently analyze and comprehend content, making it easier to identify and index relevant webpages. This ultimately leads to more accurate search results and a better user experience.
In conclusion, embracing Semantic SEO, Topical Authority, and PageRank is essential for achieving higher search rankings and increased online visibility. By leveraging NLP techniques, Semantic SEO offers a more sophisticated and efficient approach to understanding and optimizing content, ultimately benefiting both website owners and search engines.
Migration Best Practices - Search Y 2019, ParisBastian Grimm
My talk from SEARCHY 2019 in Paris covering best practices on how to successfully navigate through the various types of migrations (protocol migrations, frontend migrations, website migration, cms migration, etc.) from an SEO perspective - mainly focussing on all things technical SEO.
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Koray Tugberk GUBUR
Semantic Search Engines can understand human language to analyze the need behind a query. Instead of focusing, string, or word matching, a semantic search engine focuses on concepts, intents, and relations of named entities. Taxonomy, ontology, onomastics, semantic role labeling, relation detection, lexical semantics, entity extraction, recognition, resolution can be used by semantic search engines. In this PDF file, semantic search engines' evolution will be processed based on Google Search Engine's research papers, patents, and official announcements. From 1998 to 20021, search's and search engines' evolution, from strings to things, from phrases to entities will be told along with query processing, and parsing methodology changes.
As opposed to lexical search, semantic searching searches for meaning, not meaningless matches of the query words. Semantic search attempts to increase the relevancy of results by understanding searchers' intents and the context of terms in the searchable dataspace, whether online or within a closed system. The right semantic search content is a blend of natural language, focuses on the intent of the user, and considers other topics the user may be interested in.
Ontologies, XML, and other structured data sources can be used to retrieve knowledge using semantic search according to some authors. The use of such technologies provides a mechanism for creating formal expressions of domain knowledge that are highly expressive and may allow the user to express more detailed intent during query processing.
The Next Generation of AI-powered SearchTrey Grainger
What does it really mean to deliver an "AI-powered Search" solution? In this talk, we’ll bring clarity to this topic, showing you how to marry the art of the possible with the real-world challenges involved in understanding your content, your users, and your domain. We'll dive into emerging trends in AI-powered Search, as well as many of the stumbling blocks found in even the most advanced AI and Search applications, showing how to proactively plan for and avoid them. We'll walk through the various uses of reflected intelligence and feedback loops for continuous learning from user behavioral signals and content updates, also covering the increasing importance of virtual assistants and personalized search use cases found within the intersection of traditional search and recommendation engines. Our goal will be to provide a baseline of mainstream AI-powered Search capabilities available today, and to paint a picture of what we can all expect just on the horizon.
A beginner's guide to machine learning for SEOs - WTSFest 2022LazarinaStoyanova
This is a guide for machine learning for beginners, tailored to the SEO industry, aimed at breaking down the challenges that hold us back from experimenting, the breakdown of machine learning's main characteristics to help us understand how to implement it a bit better, and the ways we can embed advanced technology into our daily practice.
elasticsearch의 기본적인 working에 대한 발표자료입니다.
특히나 logging보다는 '검색 서비스'에 포커싱된 자료이기 때문에 '한글검색' 으로 고통받으실 분들을 위한 기초 자료라 생각해주시면 감사하겠습니다.
맞지않는 정보와 오탈자 그리고 의문점이 든다면 dydwls121200@gmail.com으로 언제든지 가벼운 마음으로 메일주세요. 저 또한 성장시키는 일이기도 하니까요. 환영합니다.
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...LazarinaStoyanova
How are you currently doing SERP analysis? Is your approach efficient or scalable?
How SEOs can incorporate programmatic approaches and advances in machine learning in order to identify winning strategies?
This talk will leave you with a better understanding of what is possible for SERP analysis at scale, what insights you can capture with the help of machine learning quickly, and how to incorporate insights into your strategy and visualize your findings to impress your stakeholders.
Building a Search Intent-Driven Website Architecture (SEO Mastery Summit 2022...LazarinaStoyanova
In this talk, given for SEO Mastery Summit 2022, I:
- propose a continuous improvement planning approach for web architecture
- highlight the importance of goal-driven site architecture planning
- highlight the relationship between site goals, user personas, and search intent
- demonstrate ways of considering search intent in both page planning and structure planning
- highlight the benefits of having an intent-driven website architecture
- highlight different ways to troubleshoot for intent mismatch
- recommend some further reading to get set you on a path of success ✨
If you want to read the talk write-up, head over to: https://lazarinastoy.com/building-a-search-intent-driven-website-architecture/
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...LazarinaStoyanova
In this presentation, I go through the different use cases of machine learning APIs for search marketers and digital marketers. Specifically, I look at APIs by Google Cloud and OpenAI and identify the best to use for your SEO projects, based on the task at hand.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Whilst passage indexing may seem like a small tweak to search ranking, it is potentially much more symptomatic of the beginning of a fundamental shift in the way that search engines understand unstructured content, determine relevance in natural language, and rank efficiently and effectively.
It could also be a means of assessing overall quality of content and a means of dynamic index pruning. We will look at the landscape, and also provide some takeaways for brands and business owners looking to improve quality in unstructured content overall in this fast changing landscape.
In this talk, Beth will give an introduction of what schema is, where it sits within a structured data framework, and its use cases. Then she’ll move on to the types that can be used or are no longer recognised by search engines. This will be followed by a hands-on discussion of how to audit a website, how to write the relevant code, and ways to implement and test.
The Reason Behind Semantic SEO: Why does Google Avoid the Word PageRank?Koray Tugberk GUBUR
This article delves into the concepts of Semantic SEO, Topical Authority, and PageRank, exploring their relationships and how they benefit both website owners and search engines. By leveraging Natural Language Processing (NLP) techniques, Semantic SEO improves search engine comprehension of content and enhances user experience, ultimately leading to better search results.
In the ever-evolving world of Search Engine Optimization (SEO), understanding the intricate connections between Semantic SEO, Topical Authority, and PageRank is crucial for webmasters, content creators, and marketers. These concepts play a vital role in enhancing the visibility and relevance of websites in search results.
Semantic SEO: Going Beyond Keywords
Semantic SEO involves optimizing content by focusing on the meaning and context of words, phrases, and sentences rather than merely targeting specific keywords. This is achieved through NLP techniques such as topic modeling, sentiment analysis, and entity recognition, which allow search engines to comprehend the true essence of content.
Topical Authority: Establishing Expertise and Trustworthiness
Topical Authority refers to the perceived expertise of a website or content creator in a specific subject area. By producing high-quality, relevant, and in-depth content, websites can establish themselves as authorities, earning the trust of both users and search engines. This translates into higher search rankings and increased visibility.
PageRank: Measuring the Importance of Webpages
PageRank is an algorithm used by Google to determine the significance of a webpage by analyzing the quality and quantity of its inbound links. A higher PageRank implies that a website is more authoritative and valuable, thus warranting a better position in search results.
The Interrelation of Semantic SEO, Topical Authority, and PageRank
Semantic SEO, Topical Authority, and PageRank are interconnected concepts that work in tandem to improve a website's search performance. By focusing on Semantic SEO, content creators can enhance their Topical Authority and establish a solid online presence. This, in turn, can lead to higher PageRank and improved search visibility.
The Benefits of Semantic SEO for Search Engines
Semantic SEO not only benefits website owners but also search engines by reducing the cost of understanding documents. With the help of NLP techniques, search engines can efficiently analyze and comprehend content, making it easier to identify and index relevant webpages. This ultimately leads to more accurate search results and a better user experience.
In conclusion, embracing Semantic SEO, Topical Authority, and PageRank is essential for achieving higher search rankings and increased online visibility. By leveraging NLP techniques, Semantic SEO offers a more sophisticated and efficient approach to understanding and optimizing content, ultimately benefiting both website owners and search engines.
Migration Best Practices - Search Y 2019, ParisBastian Grimm
My talk from SEARCHY 2019 in Paris covering best practices on how to successfully navigate through the various types of migrations (protocol migrations, frontend migrations, website migration, cms migration, etc.) from an SEO perspective - mainly focussing on all things technical SEO.
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Koray Tugberk GUBUR
Semantic Search Engines can understand human language to analyze the need behind a query. Instead of focusing, string, or word matching, a semantic search engine focuses on concepts, intents, and relations of named entities. Taxonomy, ontology, onomastics, semantic role labeling, relation detection, lexical semantics, entity extraction, recognition, resolution can be used by semantic search engines. In this PDF file, semantic search engines' evolution will be processed based on Google Search Engine's research papers, patents, and official announcements. From 1998 to 20021, search's and search engines' evolution, from strings to things, from phrases to entities will be told along with query processing, and parsing methodology changes.
As opposed to lexical search, semantic searching searches for meaning, not meaningless matches of the query words. Semantic search attempts to increase the relevancy of results by understanding searchers' intents and the context of terms in the searchable dataspace, whether online or within a closed system. The right semantic search content is a blend of natural language, focuses on the intent of the user, and considers other topics the user may be interested in.
Ontologies, XML, and other structured data sources can be used to retrieve knowledge using semantic search according to some authors. The use of such technologies provides a mechanism for creating formal expressions of domain knowledge that are highly expressive and may allow the user to express more detailed intent during query processing.
The Next Generation of AI-powered SearchTrey Grainger
What does it really mean to deliver an "AI-powered Search" solution? In this talk, we’ll bring clarity to this topic, showing you how to marry the art of the possible with the real-world challenges involved in understanding your content, your users, and your domain. We'll dive into emerging trends in AI-powered Search, as well as many of the stumbling blocks found in even the most advanced AI and Search applications, showing how to proactively plan for and avoid them. We'll walk through the various uses of reflected intelligence and feedback loops for continuous learning from user behavioral signals and content updates, also covering the increasing importance of virtual assistants and personalized search use cases found within the intersection of traditional search and recommendation engines. Our goal will be to provide a baseline of mainstream AI-powered Search capabilities available today, and to paint a picture of what we can all expect just on the horizon.
The Next Generation of AI-Powered SearchLucidworks
Trey Grainger, Chief Algorithms Officer, at Lucidworks delivers the closing keynote for ACTIVATE 2019, the Search and AI Conference hosted by Lucidworks.
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
This presentation is from the inaugural Atlanta Solr Meetup held on 2014/10/21 at Atlanta Tech Village.
Description: CareerBuilder uses Solr to power their recommendation engine, semantic search, and data analytics products. They maintain an infrastructure of hundreds of Solr servers, holding over a billion documents and serving over a million queries an hour across thousands of unique search indexes. Come learn how CareerBuilder has integrated Solr into their technology platform (with assistance from Hadoop, Cassandra, and RabbitMQ) and walk through api and code examples to see how you can use Solr to implement your own real-time recommendation engine, semantic search, and data analytics solutions.
Speaker: Trey Grainger is the Director of Engineering for Search & Analytics at CareerBuilder.com and is the co-author of Solr in Action (2014, Manning Publications), the comprehensive example-driven guide to Apache Solr. His search experience includes handling multi-lingual content across dozens of markets/languages, machine learning, semantic search, big data analytics, customized Lucene/Solr scoring models, data mining and recommendation systems. Trey is also the Founder of Celiaccess.com, a gluten-free search engine, and is a frequent speaker at Lucene and Solr-related conferences.
This presentation was given in one of the DSATL Mettups in March 2018 in partnership with Southern Data Science Conference 2018 (www.southerndatascience.com)
Presented at EuroIA17, September 2017; World IA Day NYC, February 2017; Interact, October 2016 (London, UK); earlier versions in 2014 at UXPA Boston (Boston, MA, USA); in 2013 at Interaction S.A. (Recife, Brasil), Intuit (Mountain View, CA, USA), Designers + Geeks (New York, USA); in 2012 at UX Russia (Moscow, Russia), UX Hong Kong (Hong Kong, China), WebVisions NYC (New York, NY, USA); in 2011 at the IA Summit (Denver, CO, USA), UX-LX (Lisbon, Portugal), Love at First Website (Portland, OR, USA).
This is something of a successor to my talk "Marrying Web Analytics and User Experience" (http://is.gd/vK34zS)
Measuring Impact: Towards a data citation metricEdward Baker
How the ViBRANT and eMonocot projects are building tools, including a modified implementation of Bourne and Fink's 'Scholar Factor', the Biodiversity Data Journal, and Scratchpad's user metrics and statistics modules.
Before Code: How to Plan & Visualize Your Projectmelmiller
Melinda Miller (@melindamiller) and Cate Kompare (@CateKompare) share UX techniques and their research-analysis-design framework at the University of Illinois Web Conference on April 29, 2014 in Champaign, IL.
UX Cards: http://goo.gl/IoSnNi
Resource Link: http://goo.gl/y2t2Mb
A Journey With Microsoft Cognitive Services IIMarvin Heng
A Journey with Microsoft Cognitive Service II
This slide is about Microsoft Cognitive Services. By going through you will understand what and how Microsoft Cognitive Service works.
Marvin Heng
Medium: @hmheng
Twitter: @hmheng
Github: hmheng
Using sharepoint to solve business problems #spsnairobi2014Amos Wachanga
Using sharepoint to solve business problems #spsnairobi2014. This presentation was done by Amos Wachanga of Techno Brain Ltd at Sharepoint Saturday Nairobi event on 18th Oct 2014, held at Techno Brain HQ in Nairobi, Kenya.
The presentation creates a business scenario at start, then introduces Sharepoint and mentions some key features that would solve identified business problems, and finally using the case study and examples, ties it all down through a typical solution creation for the business scenario.
Slides for my full-day information architecture workshop. Will teach in Minneapolis, MN (November 12, 2012) and Toronto, ON (November 29, 2012) Details: http://rosenfeldmedia.com/workshops/
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
Topics covered include:
- Scaling Vertically
- Hardware Considerations
- Index Optimization
- Schema Design
- Sharding
Reflected intelligence evolving self-learning data systemsTrey Grainger
In this presentation, we’ll talk about evolving self-learning search and recommendation systems which are able to accept user queries, deliver relevance-ranked results, and iteratively learn from the users’ subsequent interactions to continually deliver a more relevant experience. Such a self-learning system leverages reflected intelligence to consistently improve its understanding of the content (documents and queries), the context of specific users, and the collective feedback from all prior user interactions with the system. Through iterative feedback loops, such a system can leverage user interactions to learn the meaning of important phrases and topics within a domain, identify alternate spellings and disambiguate multiple meanings of those phrases, learn the conceptual relationships between phrases, and even learn the relative importance of features to automatically optimize its own ranking algorithms on a per-query, per-category, or per-user/group basis.
All Things Open 2014 - Day 1
Wednesday, October 22nd, 2014
Garth Braithwaite
Open Source Design & Code Contributor for Adobe
Design
Open Source Needs Design
Find more by Garth here: https://speakerdeck.com/garthdb
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
The goal of most digital transformations is to create competitive advantage by enhancing customer experience and employee success, so giving these stakeholders the ability to find the right information at their moment of need is paramount. Employees and customers increasingly expect an intuitive, interactive experience where they can simply type or speak their questions or keywords into a search box, their intent will be understood, and the best answers and content are then immediately presented.
Providing this compelling experience, however, requires a deep understanding of your content, your unique business domain, and the collective and personalized needs of each of your users. Modern artificial intelligence (AI) approaches are able to continuously learn from both your content and the ongoing stream of user interactions with your applications, and to automatically reflect back that learned intelligence in order to instantly and scalably deliver contextually-relevant answers to employees and customers.
In this talk, we'll discuss how AI is currently being deployed across the Fortune 1000 to accomplish these goals, both in the digital workplace (helping employees more efficiently get answers and make decisions) and in digital commerce (understanding customer intent and connecting them with the best information and products). We'll separate fact from fiction as we break down the hype around AI and show how it is being practically implemented today to power many real-world digital transformations for the next generation of employees and customers.
Natural Language Search with Knowledge Graphs (Chicago Meetup)Trey Grainger
To optimally interpret most natural language queries, its important to understand a highly-nuanced, contextual interpretation of the domain-specific phrases, entities, commands, and relationships represented or implied within the search and within your domain.
In this talk, we'll walk through such a search system powered by Solr's Text Tagger and Semantic Knowledge graph. We'll have fun with some of the more search-centric use cases of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "best bbq near activate" into:
{!func}mul(min(popularity,1),100) bbq^0.91032 ribs^0.65674 brisket^0.63386 doc_type:"restaurant" {!geofilt d=50 sfield="coordinates_pt" pt="38.916120,-77.045220"}
We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding like this within your search engine.
Natural Language Search with Knowledge Graphs (Activate 2019)Trey Grainger
To optimally interpret most natural language queries, its important to understand a highly-nuanced, contextual interpretation of the domain-specific phrases, entities, commands, and relationships represented or implied within the search and within your domain.
In this talk, we'll walk through such a search system powered by Solr's Text Tagger and Semantic Knowledge graph. We'll have fun with some of the more search-centric use cases of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "best bbq near activate" into:
{!func}mul(min(popularity,1),100) bbq^0.91032 ribs^0.65674 brisket^0.63386 doc_type:"restaurant" {!geofilt d=50 sfield="coordinates_pt" pt="38.916120,-77.045220"}
We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding like this within your search engine.
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
To optimally interpret most natural language queries, it is necessary to understand the phrases, entities, commands, and relationships represented or implied within the search. Knowledge graphs serve as useful instantiations of ontologies which can help represent this kind of knowledge within a domain.
In this talk, we'll walk through techniques to build knowledge graphs automatically from your own domain-specific content, how you can update and edit the nodes and relationships, and how you can seamlessly integrate them into your search solution for enhanced query interpretation and semantic search. We'll have some fun with some of the more search-centric use cased of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "bbq near haystack" into
{ filter:["doc_type":"restaurant"], "query": { "boost": { "b": "recip(geodist(38.034780,-78.486790),1,1000,1000)", "query": "bbq OR barbeque OR barbecue" } } }
We'll also specifically cover use of the Semantic Knowledge Graph, a particularly interesting knowledge graph implementation available within Apache Solr that can be auto-generated from your own domain-specific content and which provides highly-nuanced, contextual interpretation of all of the terms, phrases and entities within your domain. We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding within your search engine.
Closing keynote by Trey Grainger from Activate 2018 in Montreal, Canada. Covers trends in the intersection of Search (Information Retrieval) and Artificial Intelligence, and the underlying capabilities needed to deliver those trends at scale.
The Relevance of the Apache Solr Semantic Knowledge GraphTrey Grainger
The Semantic Knowledge Graph is an Apache Solr plugin that can be used to discover and rank the relationships between any arbitrary queries or terms within the search index. It is a relevancy swiss army knife, able to discover related terms and concepts, disambiguate different meanings of terms given their context, cleanup noise in datasets, discover previously unknown relationships between entities across documents and fields, rank lists of keywords based upon conceptual cohesion to reduce noise, summarize documents by extracting their most significant terms, generate recommendations and personalized search, and power numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. This talk will walk you through how to setup and use this plugin in concert with other open source tools (probabilistic query parser, SolrTextTagger for entity extraction) to parse, interpret, and much more correctly model the true intent of user searches than traditional keyword-based search approaches.
"Searching for Meaning: The Hidden Structure in Unstructured Data". Presentation by Trey Grainger at the Southern Data Science Conference (SDSC) 2018. Covers linguistic theory, application in search and information retrieval, and knowledge graph and ontology learning methods for automatically deriving contextualized meaning from unstructured (free text) content.
The Apache Solr Semantic Knowledge GraphTrey Grainger
What if instead of a query returning documents, you could alternatively return other keywords most related to the query: i.e. given a search for "data science", return me back results like "machine learning", "predictive modeling", "artificial neural networks", etc.? Solr’s Semantic Knowledge Graph does just that. It leverages the inverted index to automatically model the significance of relationships between every term in the inverted index (even across multiple fields) allowing real-time traversal and ranking of any relationship within your documents. Use cases for the Semantic Knowledge Graph include disambiguation of multiple meanings of terms (does "driver" mean truck driver, printer driver, a type of golf club, etc.), searching on vectors of related keywords to form a conceptual search (versus just a text match), powering recommendation algorithms, ranking lists of keywords based upon conceptual cohesion to reduce noise, summarizing documents by extracting their most significant terms, and numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. In this talk, we'll do a deep dive into the internals of how the Semantic Knowledge Graph works and will walk you through how to get up and running with an example dataset to explore the meaningful relationships hidden within your data.
Building Search & Recommendation EnginesTrey Grainger
In this talk, you'll learn how to build your own search and recommendation engine based on the open source Apache Lucene/Solr project. We'll dive into some of the data science behind how search engines work, covering multi-lingual text analysis, natural language processing, relevancy ranking algorithms, knowledge graphs, reflected intelligence, collaborative filtering, and other machine learning techniques used to drive relevant results for free-text queries. We'll also demonstrate how to build a recommendation engine leveraging the same platform and techniques that power search for most of the world's top companies. You'll walk away from this presentation with the toolbox you need to go and implement your very own search-based product using your own data.
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
Search engines, recommendation systems, advertising networks, and even data analytics tools all share the same end goal - to deliver the most relevant information possible to meet a given information need (usually in real-time). Perfecting these systems requires algorithms which can build a deep understanding of the domains represented by the underlying data, understand the nuanced ways in which words and phrases should be parsed and interpreted within different contexts, score the relationships between arbitrary phrases and concepts, continually learn from users' context and interactions to make the system smarter, and generate custom models of personalized tastes for each user of the system.
In this talk, we'll dive into both the philosophical questions associated with such systems ("how do you accurately represent and interpret the meaning of words?", "How do you prevent filter bubbles?", etc.), as well as look at practical examples of how these systems have been successfully implemented in production systems combining a variety of available commercial and open source components (inverted indexes, entity extraction, similarity scoring and machine-learned ranking, auto-generated knowledge graphs, phrase interpretation and concept expansion, etc.).
Self-learned Relevancy with Apache SolrTrey Grainger
Search engines are known for "relevancy", but the relevancy models that ship out of the box (BM25, classic tf-idf, etc.) are just scratching the surface of what's needed for a truly insightful application.
What if your search engine could automatically tune its own domain-specific relevancy model based on user interactions? What if it could learn the important phrases and topics within your domain, learn the conceptual relationships embedded within your documents, and even use machine-learned ranking to discover the relative importance of different features and then automatically optimize its own ranking algorithms for your domain? What if you could further use SQL queries to explore these relationships within your own BI tools and return results in ranked order to deliver relevance-driven analytics visualizations?
In this presentation, we'll walk through how you can leverage the myriad of capabilities in the Apache Solr ecosystem (such as the Solr Text Tagger, Semantic Knowledge Graph, Spark-Solr, Solr SQL, learning to rank, probabilistic query parsing, and Lucidworks Fusion) to build self-learning, relevance-first search, recommendations, and data analytics applications.
Search engines, and Apache Solr in particular, are quickly shifting the focus away from “big data” systems storing massive amounts of raw (but largely unharnessed) content, to “smart data” systems where the most relevant and actionable content is quickly surfaced instead. Apache Solr is the blazing-fast and fault-tolerant distributed search engine leveraged by 90% of Fortune 500 companies. As a community-driven open source project, Solr brings in diverse contributions from many of the top companies in the world, particularly those for whom returning the most relevant results is mission critical.
Out of the box, Solr includes advanced capabilities like learning to rank (machine-learned ranking), graph queries and distributed graph traversals, job scheduling for processing batch and streaming data workloads, the ability to build and deploy machine learning models, and a wide variety of query parsers and functions allowing you to very easily build highly relevant and domain-specific semantic search, recommendations, or personalized search experiences. These days, Solr even enables you to run SQL queries directly against it, mixing and matching the full power of Solr’s free-text, geospatial, and other search capabilities with the a prominent query language already known by most developers (and which many external systems can use to query Solr directly).
Due to the community-oriented nature of Solr, the ecosystem of capabilities also spans well beyond just the core project. In this talk, we’ll also cover several other projects within the larger Apache Lucene/Solr ecosystem that further enhance Solr’s smart data capabilities: bi-directional integration of Apache Spark and Solr’s capabilities, large-scale entity extraction, semantic knowledge graphs for discovering, traversing, and scoring meaningful relationships within your data, auto-generation of domain-specific ontologies, running SPARQL queries against Solr on RDF triples, probabilistic identification of key phrases within a query or document, conceptual search leveraging Word2Vec, and even Lucidworks’ own Fusion project which extends Solr to provide an enterprise-ready smart data platform out of the box.
We’ll dive into how all of these capabilities can fit within your data science toolbox, and you’ll come away with a really good feel for how to build highly relevant “smart data” applications leveraging these key technologies.
South Big Data Hub: Text Data Analysis PanelTrey Grainger
Slides from Trey's opening presentation for the South Big Data Hub's Text Data Analysis Panel on December 8th, 2016. Trey provided a quick introduction to Apache Solr, described how companies are using Solr to power relevant search in industry, and provided a glimpse on where the industry is heading with regard to implementing more intelligent and relevant semantic search.
Presentation of the Semantic Knowledge Graph research paper at the 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (Montreal, Canada - October 18th, 2016)
Abstract—This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
What if your search engine could automatically tune its own domain-specific relevancy model? What if it could learn the important phrases and topics within your domain, automatically identify alternate spellings (synonyms, acronyms, and related phrases) and disambiguate multiple meanings of those phrases, learn the conceptual relationships embedded within your documents, and even use machine-learned ranking to discover the relative importance of different features and then automatically optimize its own ranking algorithms for your domain?
In this presentation, you’ll learn you how to do just that - to evolving Lucene/Solr implementations into self-learning data systems which are able to accept user queries, deliver relevance-ranked results, and automatically learn from your users’ subsequent interactions to continually deliver a more relevant experience for each keyword, category, and group of users.
Such a self-learning system leverages reflected intelligence to consistently improve its understanding of the content (documents and queries), the context of specific users, and the relevance signals present in the collective feedback from every prior user interaction with the system. Come learn how to move beyond manual relevancy tuning and toward a closed-loop system leveraging both the embedded meaning within your content and the wisdom of the crowds to automatically generate search relevancy algorithms optimized for your domain.
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
Search engines frequently miss the mark when it comes to understanding user intent. This talk will describe how to overcome this by leveraging Lucene/Solr to power a knowledge graph that can extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships. For example, if a user types in (Senior Java Developer Portland, OR Hadoop), you or I know that the term “senior” designates an experience level, that “java developer” is a job title related to “software engineering”, that “portland, or” is a city with a specific geographical boundary, and that “hadoop” is a technology related to terms like “hbase”, “hive”, and “map/reduce”. Out of the box, however, most search engines just parse this query as text:((senior AND java AND developer AND portland) OR (hadoop)), which is not at all what the user intended. We will discuss how to train the search engine to parse the query into this intended understanding, and how to reflect this understanding to the end user to provide an insightful, augmented search experience. Topics: Semantic Search, Finite State Transducers, Probabilistic Parsing, Bayes Theorem, Augmented Search, Recommendations, NLP, Knowledge Graphs
Semantic & Multilingual Strategies in Lucene/SolrTrey Grainger
When searching on text, choosing the right CharFilters, Tokenizer, stemmers, and other TokenFilters for each supported language is critical. Additional tools of the trade include language detection through UpdateRequestProcessors, parts of speech analysis, entity extraction, stopword and synonym lists, relevancy differentiation for exact vs. stemmed vs. conceptual matches, and identification of statistically interesting phrases per language. For multilingual search, you also need to choose between several strategies such as: searching across multiple fields, using a separate collection per language combination, or combining multiple languages in a single field (custom code is required for this and will be open sourced). These all have their own strengths and weaknesses depending upon your use case. This talk will provide a tutorial (with code examples) on how to pull off each of these strategies as well as compare and contrast the different kinds of stemmers, review the precision/recall impact of stemming vs. lemmatization, and describe some techniques for extracting meaningful relationships between terms to power a semantic search experience per-language. Come learn how to build an excellent semantic and multilingual search system using the best tools and techniques Lucene/Solr has to offer!
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
Talk Abstract: Most work in semantic search has thus far focused upon either manually building language-specific taxonomies/ontologies or upon automatic techniques such as clustering or dimensionality reduction to discover latent semantic links within the content that is being searched. The former is very labor intensive and is hard to maintain, while the latter is prone to noise and may be hard for a human to understand or to interact with directly. We believe that the links between similar user’s queries represent a largely untapped source for discovering latent semantic relationships between search terms. The proposed system is capable of mining user search logs to discover semantic relationships between key phrases in a manner that is language agnostic, human understandable, and virtually noise-free.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
2. Trey Grainger
Chief Algorithms Officer
• Previously: SVP of Engineering @ Lucidworks; Director of Engineering @ CareerBuilder
• Georgia Tech – MBA, Management of Technology
• Furman University – BA, Computer Science, Business, & Philosophy
• Stanford University – Information Retrieval & Web Search
Other fun projects:
• Co-author of Solr in Action, plus numerous research publications
• Advisor to Presearch, the decentralized search engine
• Lucene / Solr contributor
About Me
3. • About Lucidworks
• What is AI-powered Search?
• The Dimensions of User Intent
• Content Understanding:
• Keyword Search
• User Understanding:
• Collaborative Recommendations
• Content Understanding + User Understanding:
• Personalized Search
• Domain Understanding:
• Knowledge Graphs
• Domain Understanding + User Understanding:
• Domain-aware Matching
• Content Understanding + Domain Understanding:
• Semantic Search
• Balancing Approaches:
• Keyword vs. Vector vs. Knowledge Graph Search
• Vector Search
• Knowledge Graph Search
• Combining it all together
Agenda
4. Who are we?
300+ CUSTOMERS ACROSS THE
FORTUNE 1000
400+EMPLOYEES
OFFICES IN
San Francisco, CA (HQ)
Raleigh-Durham, NC
Cambridge, UK
Bangalore, India
Hong Kong
The Search & AI Conference
COMPANY BEHIND
D E V E L O P M E N T,
H O S T I N G ,
& S U P P O R T
5. Proudly built with open-source
tech at its core: Apache Solr &
Apache Spark
Personalizes search
with applied
machine learning
Proven on the
world’s biggest
information systems
16. BM25 (Relevance Scoring between Query and Documents)
Score(q, d) =
∑ idf(t) · ( tf(t in d) · (k + 1) ) / ( tf(t in d) + k · (1 – b + b · |d| / avgdl )
t in q
Where:
t = term; d = document; q = query; i = index
tf(t in d) = numTermOccurrencesInDocument ½
idf(t) = 1 + log (numDocs / (docFreq + 1))
|d| = ∑ 1
t in d
avgdl = = ( ∑ |d| ) / ( ∑ 1 ) )
d in i d in i
k = Free parameter. Usually ~1.2 to 2.0. Increases term frequency saturation point.
b = Free parameter. Usually ~0.75. Increases impact of document normalization.
26. What is a Knowledge Graph?
(vs. Ontology vs. Taxonomy vs. Synonyms, etc.)
27. Overly Simplistic Definitions
Alternative Labels: Substitute words with identical meanings
[ CTO => Chief Technology Officer; specialise => specialize ]
Synonyms List: Provides substitute words that can be used to represent
the same or very similar things
[ human => homo sapien, mankind; food => sustenance, meal ]
Taxonomy: Classifies things into Categories
[ john is Human; Human is Mammal; Mammal is Animal ]
Ontology: Defines relationships between types of things
[ animal eats food; human is animal ]
Knowledge Graph: Instantiation of an
Ontology (contains the things that are related)
[ john is human; john eats food ]
A Knowledge Graph subsumes the other types.
32. Keyword Search
(Completely User-specified)
User-guided
Recommendations
(Mostly driven by user profile,
partially user-specified)
Traditional
Recommendations
(Completely driven by
user behavior)
Keyword Search
(Completely User-specified)
Personalized
Queries
(Mostly user-specified,
partially driven by user profile)
33. Personalized
Queries
(Mostly user-specified,
partially driven by user profile)
Keyword Search
(Completely User-specified)
User-guided
Recommendations
(Mostly driven by user profile,
partially user-specified)
Traditional
Recommendations
(Completely driven by
user behavior)
Personalized Search
42. Personas / User Profiles
(User attributes and preferences in
knowledge graph)
Multimodal Recommendations
(Recommendations combining
collaborative filtering plus user-based
profile attribute matching/ranking)
Knowledge Graph
(Understanding conceptual
and logical relationships
between domain-specific entities)
Collaborative
Recommendations
(Completely driven by
user behavior)
43. Personas / User Profiles
(User attributes and preferences in
knowledge graph)
Multimodal Recommendations
(Recommendations combining
collaborative filtering plus user-based
profile attribute matching/ranking)
Knowledge Graph
(Understanding conceptual
and logical relationships
between domain-specific entities)
Collaborative
Recommendations
(Completely driven by
user behavior)
Domain-aware Matching
44.
45.
46.
47.
48. http://localhost:8983/solr/jobs/select/?
fl=jobtitle,city,state,salary&
q=(
jobtitle:"nurse educator"^25 OR jobtitle:(nurse educator)^10
)
AND (
(city:"Boston" AND state:"MA")^15
OR state:"MA")
AND _val_:"map(salary, 40000, 60000,10, 0)"
AND similar_users:{!terms}u99,u1,u50,u2311,u253,u70,u99
*Example derived from chapter 16 of Solr in Action
Multimodal Recommendations
Jane is a nurse educator in Boston seeking between $40K and $60K
She has interacted with the same content as the following users:
u99,u1,u50,u2311,u253,u70,u99
61. Vector Similarity Scores:
Performance Considerations
Problem: Vector Scoring is Slow
• Unlike keyword search, which looks up pre-indexed answers to queries, Vector Search must instead calculate
similarities between the query vector and every document’s vectors to determine best matches, which is
slow at scale.
Solution: Quantized Vectors
• “Quantization” is the process for mapping vectors features to discrete values.
• Creating “tokens” which map to a similar vector space, enables matching on those tokens to perform an ANN
(Approximate Nearest Neighbor) search
• This enables converting vector scoring into a search problem (term lookup and scoring), which is fast again,
at the expense of some recall and scoring accuracy
Recommended Approach: Quantized Vector Search + Vector Similarity Reranking
• Combine the best of both worlds by running an initial ANN search on a quantized vector representation, and
then re-rank the top-N results using full Vector similarity scoring.
78. • Take queries, documents, sentences, paragraphs, etc. and
transform them into vectors.
• Usually leverage deep learning, which can discover rich language
usage rules and map them to combinations of features in the
vector
• Popular Libraries:
• Bert
• Elmo
• Universal Sentence Encoder
• Word2Vec
• Sentence2Vec
• Glove
• fastText
• many more …
Vector Encoders
79.
80. Query Type Likely Outcome
Obscure keyword combinations
Q. (software OR hardware) AND enginee*
• Keyword search succeeds
• Vector Search fails
Natural Language Queries
Q. Can my wife drive on my insurance?
• Keyword search might get
lucky, but probably fails
• Vector Search succeeds
Fuzzy Language Queries
Q. famous french tower
• Keyword search mismatch
yields poor results
• Vector Search succeeds
Structured Relationship Queries
Q. popular bbq near Activate
• Keyword search fails
• Vector search fails
• Need a Knowledge Graph!
Keyword Search vs. Vector Search
81. Giant Graph of Relationships...
Trey Grainger works for Lucidworks.
He spoke at the Activate 2019
conference.
#Activate19
(Activate) wqs held in Washington, DC
September 9-12, 2019.
Trey got his masters degree from
Georgia Tech.
Trey’s Voicemail
83. id: 1
job_title: Software Engineer
desc: software engineer at a
great company
skills: .Net, C#, java
id: 2
job_title: Registered Nurse
desc: a registered nurse at
hospital doing hard work
skills: oncology, phlebotemy
id: 3
job_title: Java Developer
desc: a software engineer or a
java engineer doing work
skills: java, scala, hibernate
field doc term
desc
1
a
at
company
engineer
great
software
2
a
at
doing
hard
hospital
nurse
registered
work
3
a
doing
engineer
java
or
software
work
job_title 1
Software
Engineer
… … …
Terms-Docs Inverted IndexDocs-Terms Forward IndexDocuments
Source: Trey Grainger,
Khalifeh AlJadda,
Mohammed Korayem,
Andries Smith.“The Semantic
Knowledge Graph: A
compact, auto-generated
model for real-time traversal
and ranking of any
relationship within a domain”.
DSAA 2016.
Knowledge
Graph
field term postings
list
doc pos
desc
a
1 4
2 1
3 1, 5
at
1 3
2 4
company 1 6
doing
2 6
3 8
engineer
1 2
3 3, 7
great 1 5
hard 2 7
hospital 2 5
java 3 6
nurse 2 3
or 3 4
registered 2 2
software
1 1
3 2
work
2 10
3 9
job_title java developer 3 1
… … … …
84. Related term vector (for query concept expansion)
http://localhost:8983/solr/stack-exchange-health/skg
85. Disambiguation by Category Example
Meaning 1: Restaurant => bbq, brisket, ribs, pork, …
Meaning 2: Outdoor Equipment => bbq, grill, charcoal, propane, …
94. Demo Data
Places (also includes geonames database)
Entities (includes search commands)
Text Content
[ Web crawl of restaurant and product reviews sites ]
98. popular barbeque near Activate
(popular same as "good", "top", "best")
Hotels near Haystack EU
hotels near popular BBQ in Berlin
BBQ near airports near Berlin
hotels near movie theaters in Berlin …
Other Knowledge Graph Search examples:
100. News Search : popularity and freshness drive relevance
Restaurant Search: geographical proximity and price range are critical
Ecommerce: likelihood of a purchase is key
Movie search: More popular titles are generally more relevant
Job search: category of job, salary range, and geographical proximity matter
The right ranking algorithm is domain and context-dependent
101. Example Combining Content + Domain + User Context
News website:
/select?
fq={!cache=false v=$keywords}&
q= {!func}scale(query($keywords),0,25)
{!func}scale(geodist(),0,25)
{!func}recip(rord(publicationDate),1,25,0)
{!func}scale(popularity,0,25)&
keywords="fall festival"&
sfield=location&
pt=33.748,-84.391
25%
25%
25%
25%
*Example from chapter 16 of Solr in Action
102. But how do we figure out the right
balance of weights?
103. Learning to Rank
User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Re
Alonzo ipad do
do
do
Elena printer do
do
do
Ming ipad do
do
do
… … …
User Action Document
Alonzo click doc22
Elena click doc17
Ming click doc12
Alonzo purchase doc22
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
Feature Weight
title_match_all_terms 15.25
exact_phrase_match 10
signal_boost 9.5
content_age 9.2
user_geo_distance 6.5
personalization_cat_1 2.8
doc_popularity 2.75
… …
ipad ⌕
Initial Results:
1) doc1
2) doc2
3) doc3
Build Ranking Classifier
(from Implicit Relevance Judgements)
Final Results:
1) doc3
2) doc1
3) doc2