While traditional keyword search is still useful, pure text-based keyword matching is quickly becoming obsolete; today, it is a necessary but not sufficient tool for delivering relevant results and intelligent search experiences.
In this talk, we'll cover some of the emerging trends in AI-powered search, including the use of thought vectors (multi-level vector embeddings) and semantic knowledge graphs to contextually interpret and conceptualize queries. We'll walk through some live query interpretation demos to demonstrate the power that can be delivered through these semantic search techniques leveraging auto-generated knowledge graphs learned from your content and user interactions.
Balancing the Dimensions of User IntentTrey Grainger
Â
The first step in returning relevant search results is successfully interpreting the userâs intent. This requires combining a holistic understanding of your content, your users, and your domain. Traditional keyword search focuses on the content understanding dimension. Knowledge graphs are then typically built and leveraged to represent an understanding of your domain. Finally, Collaborative recommendations and user profile learning are typically the tools of choice for generating and modeling an understanding of the preferences of each user.
While these systems (search, recommendations, and knowledge graphs) are often built and used in isolation, combining them together is the key to truly understanding a userâs query intent. For example, combining traditional keyword search with your knowledge graph leads to semantic search capabilities, and combining traditional keyword search with recommendations leads to personalized search experiences. Combining all of these dimensions together in an appropriately balanced way will ultimately lead to the most accurate interpretation of a userâs query, resulting in a better query to the core search engine and ultimately a better, more relevant search experience.
In this talk, weâll demonstrate strategies for delivering and combining each of these dimensions of user intent, and weâll walk through concrete examples of how to balance the nuances of each so that you also donât over-personalize, over-contextualize, or under appreciate the nuances of your userâs intent.
Building a semantic search system - one that can correctly parse and interpret end-user intent and return the ideal results for usersâ queries - is not an easy task. It requires semantically parsing the terms, phrases, and structure within queries, disambiguating polysemous terms, correcting misspellings, expanding to conceptually synonymous or related concepts, and rewriting queries in a way that maps the correct interpretation of each end userâs query into the ideal representation of features and weights that will return the best results for that user. Not only that, but the above must often be done within the confines of a very specific domain - ripe with its own jargon and linguistic and conceptual nuances.
This talk will walk through the anatomy of a semantic search system and how each of the pieces described above fit together to deliver a final solution. We'll leverage several recently-released capabilities in Apache Solr (the Semantic Knowledge Graph, Solr Text Tagger, Statistical Phrase Identifier) and Lucidworks Fusion (query log mining, misspelling job, word2vec job, query pipelines, relevancy experiment backtesting) to show you an end-to-end working Semantic Search system that can automatically learn the nuances of any domain and deliver a substantially more relevant search experience.
Opinion-based Article Ranking for Information Retrieval Systems: Factoids and...Koray Tugberk GUBUR
Â
How Search Engines Leverage Opinion-based Articles for Ranking?
Search engines use opinions, and factoids to understand the consensus. News search engines use different reports, and opinions in their search results to satisfy the urgent news information needed by the newsreaders. The news search engines differentiate disinformation from information to protect the newsreaders. Google, Microsoft Bing, Yandex, and DuckDuckGo have different algorithms and prioritization for classifications of the news sources, or prioritization of the news, and newsworthy topics.
Corroboration of the Web Answers from the Open Web is a research paper from Amelia Marian and Minji Wu explaining how a search engine can rank information according to its accuracy.
Google started to explain that the Expertise-Authoriteveness-Trustworthiness is the most important group of signals to be sure that a result won't shame the search engine. Embarrassment factors for the search engines involve wrong information on a news title on the news story, or a wrong featured snippet. A search engine might be shame due to the bad result that is ranking on the SERP.
Dense-retrieval, context scoring, named entity recognition, semantic role labeling, truth ranges, fix points, confidence score, query processing, and parsing.
Context understanding requires processing the text, and tokenizing the words by recognizing the word sense. Processing the text of the news articles requires time. And, most of the time, news search engines do not have enough time for processing the text. Thus, PageRank provides a sustainable timeline for the news sources for rankings.
PageRank is a quick signal for search engines to show the authenticity of the news web source. The highly cited sources are ranked higher, and longer on the top stories. Usually, Google protects the high PageRank sources by trusting the judgment of the websites. But, fact-finding algorithms do not use PageRank mostly, unless they couldn't decide by looking at other factors, or they do not have enough resources to process the text among the hundreds of sources.
News ranking algorithms differentiate opinions, reports, and breaking news from each other. News-related entities, their co-existence, and contextual relations change. Google inventors suggest differentiation of these entities from each other for a proper news categorization.
News categorization is important to match the interested topics of the users in queryless news feeds such as Google Discover. Google Discover is a queryless news feed that serves news stories according to the users' interest areas.
An opinion for news might be misleading. Some news titles might be too harsh, or strict. Search engines use these headlines to differentiate the non-trustworthy news sources from the trustworthy ones. And, opinions of journalists or their different interpretations of the events might change the rankings of a document according to the fact-finding algorithms.
The Python Cheat Sheet for the Busy MarketerHamlet Batista
Â
What percentage of an Inbound marketer's day doesn't involve working with spreadsheets? How much of this work is time-consuming and repetitive? In this interactive session, you will learn how to manipulate Google Sheets to automate common data analysis workflows using Python, a very easy to use programming language.
Lexical Semantics, Semantic Similarity and Relevance for SEOKoray Tugberk GUBUR
Â
Lexical semantics and relations between words include relations of superiority, inferiority, part, whole, opposition, and sameness between the meanings of words. The same word can be a meronymy, hyponym, or antonym of another word, depending on the word before or after it. The lexical relation value of the first word can affect the structure of the next word, affecting the context of the sentence and the Information Retrieval Score. Information Retrieval Score is the score that determines how much content is related to a query, how close the different variants of the related query are, and the structure processed by the search engineâs query processor to the relevant document. A higher information retrieval score represents better relevance and possible click satisfaction.
The problem with a semi-structured and distracting context for Information Retrieval Score is that, if a document is not configured for a single topic, the IR Score can be diluted by the two different contexts resulting in a relative rank lost to another textual document.
IR Score Dilution involves badly structured lexical relations, along with bad word proximity. The relevant words that complete each other within the meaning map should be used closely, within a paragraph or section of the document, to signal the context in a more clear way to increase the IR Score. A search engine can check whether the document contains the hyponym of the words within the query or not. A possible query prediction can be generated from the hypernyms of the query. A search engine can check only the anchor texts to see whether there is a word within the âhyponym distanceâ which represents the hyponym depth between two different words.
Lexical Relations can represent the semantic annotations for a document. A semantic annotation is a word that describes the document overall in terms of category and main context that carries the purpose of the document. A semantic annotation can contain the main entity of the document or a general concept for covering a broader meaning area (knowledge domain). Semantic Annotations can be generated with the lexical relations between words. A semantic annotation can be used to match the document to the query. Semantic annotations are factors for a better IR Score.
A search engine can generate phrase patterns from the lexical relationships between words within the queries or the documents. A phrase pattern contains sections that define a concept with qualifiers. Phrase patterns can contain a hyponym just after an adjective, or a hypernym with the antonym of the same adjective. Most of these connections and patterns are used within the Recurrent Neural Network (RNN) for the next word prediction. A phrase pattern helps a search engine to increase its confidence score for relating the document to the specific query, or the meaning of the query.
Semantic Content Networks - Ranking Websites on Google with Semantic SEOKoray Tugberk GUBUR
Â
Semantic Content Networks are the semantic networks of things with relations, directed graphs, attributes and facts. Every declaration, and proposition for semantic search represent a factual repository. Open Information Extraction is a methodology for creation of a semantic network. The Knowledge Base and Knowledge Graph are connected things to each other in terms of factual repository usage. The Knowledge Base represents a factual repository with descriptions and triples. Knowledge Graph is the visualized version of the Knowledge Base. A semantic network is knowledge representation. Semantic Network is prominent to understand the value of an individual node, or the similar and distant members of the same semantic network. Semantic networks are implemented for the search engine result pages. Semantic networks are to create a factual and connected question and answer networks. A semantic network can be represented and consist of from textual and visual content. Semantic Network include lexical parts and lexical units.
Links, Nodes, and Labels are parts of the semantic networks. Procedural Parts are constructors, destructors, writers and readers. Procedural parts are to expand the semantic networks and refresh the information on it.
Structural Part has links and nodes. Semantic part has the associated meanings which are represented as the labels.
The semantic content networks have different types of relations and relation types.
Semantic content networks have "and/OR" trees.
Semantic Content Networks have "Relation Type Examples" with "is/A" hierarchies.
Semantic Content Networks have "is/Part" Hierarchy.
Inheritance, reification, multiple inheritance, range queries and values, intersection search, complex semantic networks, inferential distance, partial ordering, semantic distance, and semantic relevance are concepts from semantic networks.
Semantic networks help understanding semantic search engines and the semantic SEO. Because, it contains all of the related lexical relations, semantic role labels, entity-attribute pairs, or triples like entity, predicate and object. Search engines prefer to use semantic networks to understand the factuality of a website. Knowledge-based Trust is related to the semantic networks because it provides a factuality related trust score to balance the PageRank. The knowledge-based Trust is announced by Luna DONG. Ramanathan V. Guha is another inventor from the Google and Schema.org. He focuses on the semantic web and semantic search engine behaviors. He explored and invented the semantic search engine related facts.
Semantic Content Networks are used as a concept by Koray TuÄberk GĂBĂR who is founder of Holistic SEO & Digital. Expressing semantic content networks helps to shape the semantic networks via textual and visual content pieces. The semantic content networks are helpful to shape the truth on the open web, and help a search engine to rank a website even if there is no external PageRank flow.
Approximate nearest neighbor methods and vector models â NYC ML meetupErik Bernhardsson
Â
Nearest neighbors refers to something that is conceptually very simple. For a set of points in some space (possibly many dimensions), we want to find the closest k neighbors quickly.
This presentation covers a library called Annoy built my me that that helps you do (approximate) nearest neighbor queries in high dimensional spaces. We're going through vector models, how to measure similarity, and why nearest neighbor queries are useful.
Balancing the Dimensions of User IntentTrey Grainger
Â
The first step in returning relevant search results is successfully interpreting the userâs intent. This requires combining a holistic understanding of your content, your users, and your domain. Traditional keyword search focuses on the content understanding dimension. Knowledge graphs are then typically built and leveraged to represent an understanding of your domain. Finally, Collaborative recommendations and user profile learning are typically the tools of choice for generating and modeling an understanding of the preferences of each user.
While these systems (search, recommendations, and knowledge graphs) are often built and used in isolation, combining them together is the key to truly understanding a userâs query intent. For example, combining traditional keyword search with your knowledge graph leads to semantic search capabilities, and combining traditional keyword search with recommendations leads to personalized search experiences. Combining all of these dimensions together in an appropriately balanced way will ultimately lead to the most accurate interpretation of a userâs query, resulting in a better query to the core search engine and ultimately a better, more relevant search experience.
In this talk, weâll demonstrate strategies for delivering and combining each of these dimensions of user intent, and weâll walk through concrete examples of how to balance the nuances of each so that you also donât over-personalize, over-contextualize, or under appreciate the nuances of your userâs intent.
Building a semantic search system - one that can correctly parse and interpret end-user intent and return the ideal results for usersâ queries - is not an easy task. It requires semantically parsing the terms, phrases, and structure within queries, disambiguating polysemous terms, correcting misspellings, expanding to conceptually synonymous or related concepts, and rewriting queries in a way that maps the correct interpretation of each end userâs query into the ideal representation of features and weights that will return the best results for that user. Not only that, but the above must often be done within the confines of a very specific domain - ripe with its own jargon and linguistic and conceptual nuances.
This talk will walk through the anatomy of a semantic search system and how each of the pieces described above fit together to deliver a final solution. We'll leverage several recently-released capabilities in Apache Solr (the Semantic Knowledge Graph, Solr Text Tagger, Statistical Phrase Identifier) and Lucidworks Fusion (query log mining, misspelling job, word2vec job, query pipelines, relevancy experiment backtesting) to show you an end-to-end working Semantic Search system that can automatically learn the nuances of any domain and deliver a substantially more relevant search experience.
Opinion-based Article Ranking for Information Retrieval Systems: Factoids and...Koray Tugberk GUBUR
Â
How Search Engines Leverage Opinion-based Articles for Ranking?
Search engines use opinions, and factoids to understand the consensus. News search engines use different reports, and opinions in their search results to satisfy the urgent news information needed by the newsreaders. The news search engines differentiate disinformation from information to protect the newsreaders. Google, Microsoft Bing, Yandex, and DuckDuckGo have different algorithms and prioritization for classifications of the news sources, or prioritization of the news, and newsworthy topics.
Corroboration of the Web Answers from the Open Web is a research paper from Amelia Marian and Minji Wu explaining how a search engine can rank information according to its accuracy.
Google started to explain that the Expertise-Authoriteveness-Trustworthiness is the most important group of signals to be sure that a result won't shame the search engine. Embarrassment factors for the search engines involve wrong information on a news title on the news story, or a wrong featured snippet. A search engine might be shame due to the bad result that is ranking on the SERP.
Dense-retrieval, context scoring, named entity recognition, semantic role labeling, truth ranges, fix points, confidence score, query processing, and parsing.
Context understanding requires processing the text, and tokenizing the words by recognizing the word sense. Processing the text of the news articles requires time. And, most of the time, news search engines do not have enough time for processing the text. Thus, PageRank provides a sustainable timeline for the news sources for rankings.
PageRank is a quick signal for search engines to show the authenticity of the news web source. The highly cited sources are ranked higher, and longer on the top stories. Usually, Google protects the high PageRank sources by trusting the judgment of the websites. But, fact-finding algorithms do not use PageRank mostly, unless they couldn't decide by looking at other factors, or they do not have enough resources to process the text among the hundreds of sources.
News ranking algorithms differentiate opinions, reports, and breaking news from each other. News-related entities, their co-existence, and contextual relations change. Google inventors suggest differentiation of these entities from each other for a proper news categorization.
News categorization is important to match the interested topics of the users in queryless news feeds such as Google Discover. Google Discover is a queryless news feed that serves news stories according to the users' interest areas.
An opinion for news might be misleading. Some news titles might be too harsh, or strict. Search engines use these headlines to differentiate the non-trustworthy news sources from the trustworthy ones. And, opinions of journalists or their different interpretations of the events might change the rankings of a document according to the fact-finding algorithms.
The Python Cheat Sheet for the Busy MarketerHamlet Batista
Â
What percentage of an Inbound marketer's day doesn't involve working with spreadsheets? How much of this work is time-consuming and repetitive? In this interactive session, you will learn how to manipulate Google Sheets to automate common data analysis workflows using Python, a very easy to use programming language.
Lexical Semantics, Semantic Similarity and Relevance for SEOKoray Tugberk GUBUR
Â
Lexical semantics and relations between words include relations of superiority, inferiority, part, whole, opposition, and sameness between the meanings of words. The same word can be a meronymy, hyponym, or antonym of another word, depending on the word before or after it. The lexical relation value of the first word can affect the structure of the next word, affecting the context of the sentence and the Information Retrieval Score. Information Retrieval Score is the score that determines how much content is related to a query, how close the different variants of the related query are, and the structure processed by the search engineâs query processor to the relevant document. A higher information retrieval score represents better relevance and possible click satisfaction.
The problem with a semi-structured and distracting context for Information Retrieval Score is that, if a document is not configured for a single topic, the IR Score can be diluted by the two different contexts resulting in a relative rank lost to another textual document.
IR Score Dilution involves badly structured lexical relations, along with bad word proximity. The relevant words that complete each other within the meaning map should be used closely, within a paragraph or section of the document, to signal the context in a more clear way to increase the IR Score. A search engine can check whether the document contains the hyponym of the words within the query or not. A possible query prediction can be generated from the hypernyms of the query. A search engine can check only the anchor texts to see whether there is a word within the âhyponym distanceâ which represents the hyponym depth between two different words.
Lexical Relations can represent the semantic annotations for a document. A semantic annotation is a word that describes the document overall in terms of category and main context that carries the purpose of the document. A semantic annotation can contain the main entity of the document or a general concept for covering a broader meaning area (knowledge domain). Semantic Annotations can be generated with the lexical relations between words. A semantic annotation can be used to match the document to the query. Semantic annotations are factors for a better IR Score.
A search engine can generate phrase patterns from the lexical relationships between words within the queries or the documents. A phrase pattern contains sections that define a concept with qualifiers. Phrase patterns can contain a hyponym just after an adjective, or a hypernym with the antonym of the same adjective. Most of these connections and patterns are used within the Recurrent Neural Network (RNN) for the next word prediction. A phrase pattern helps a search engine to increase its confidence score for relating the document to the specific query, or the meaning of the query.
Semantic Content Networks - Ranking Websites on Google with Semantic SEOKoray Tugberk GUBUR
Â
Semantic Content Networks are the semantic networks of things with relations, directed graphs, attributes and facts. Every declaration, and proposition for semantic search represent a factual repository. Open Information Extraction is a methodology for creation of a semantic network. The Knowledge Base and Knowledge Graph are connected things to each other in terms of factual repository usage. The Knowledge Base represents a factual repository with descriptions and triples. Knowledge Graph is the visualized version of the Knowledge Base. A semantic network is knowledge representation. Semantic Network is prominent to understand the value of an individual node, or the similar and distant members of the same semantic network. Semantic networks are implemented for the search engine result pages. Semantic networks are to create a factual and connected question and answer networks. A semantic network can be represented and consist of from textual and visual content. Semantic Network include lexical parts and lexical units.
Links, Nodes, and Labels are parts of the semantic networks. Procedural Parts are constructors, destructors, writers and readers. Procedural parts are to expand the semantic networks and refresh the information on it.
Structural Part has links and nodes. Semantic part has the associated meanings which are represented as the labels.
The semantic content networks have different types of relations and relation types.
Semantic content networks have "and/OR" trees.
Semantic Content Networks have "Relation Type Examples" with "is/A" hierarchies.
Semantic Content Networks have "is/Part" Hierarchy.
Inheritance, reification, multiple inheritance, range queries and values, intersection search, complex semantic networks, inferential distance, partial ordering, semantic distance, and semantic relevance are concepts from semantic networks.
Semantic networks help understanding semantic search engines and the semantic SEO. Because, it contains all of the related lexical relations, semantic role labels, entity-attribute pairs, or triples like entity, predicate and object. Search engines prefer to use semantic networks to understand the factuality of a website. Knowledge-based Trust is related to the semantic networks because it provides a factuality related trust score to balance the PageRank. The knowledge-based Trust is announced by Luna DONG. Ramanathan V. Guha is another inventor from the Google and Schema.org. He focuses on the semantic web and semantic search engine behaviors. He explored and invented the semantic search engine related facts.
Semantic Content Networks are used as a concept by Koray TuÄberk GĂBĂR who is founder of Holistic SEO & Digital. Expressing semantic content networks helps to shape the semantic networks via textual and visual content pieces. The semantic content networks are helpful to shape the truth on the open web, and help a search engine to rank a website even if there is no external PageRank flow.
Approximate nearest neighbor methods and vector models â NYC ML meetupErik Bernhardsson
Â
Nearest neighbors refers to something that is conceptually very simple. For a set of points in some space (possibly many dimensions), we want to find the closest k neighbors quickly.
This presentation covers a library called Annoy built my me that that helps you do (approximate) nearest neighbor queries in high dimensional spaces. We're going through vector models, how to measure similarity, and why nearest neighbor queries are useful.
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEOKoray Tugberk GUBUR
Â
Query Processing is the process of query term weight calculation, query augmentation, query context defining, and more. Query understanding and Query clustering are related to Information Retrieval tasks for the search engines. To provide a better search engine optimization effort and project result, the organic search performance optimizers need to implement query processing methodologies. Digital marketing and SEO are connected to each other. Understanding a query includes query parsing, query rewriting, question generation, and answer pairing. Multi-stages Query Processing, Candidate Answer Passages, or Candidate Answer Passages and Answer Term Weighting are some of the concepts from the Google Search Engine to parse the queries.
The presentation of The Secret Life of Queries, Parsing, Rewriting & SEO has been presented at the Brighton SEO Event in April 2022. The event speech focused on explaining the theoretical SEO and practical SEO examples together.
Query Processing methodologies are beyond synonym matching or synonym finding. It involves multiple aspects of the words, and meanings of the words. The theme of words, the centrality of words, attention windows, context windows, and word co-occurrence matrices, GloVe, Word2Vec, word embeddings, character embeddings, and more.
Themes of words contain the word probability like in Continues Bag of Window.
The search engine optimization community focuses on keyword research by matching the queries. Query processing involves query word order change, query word type change, query word combination change, query phrase synonym usage, query question generation, query clustering. Query processing and document processing are correlational. Query processing is to understand a query while document processing is to process a web document. Both of the processes are for ranking algorithms. Providing a better ranking algorithm requires a better query understanding. And providing better rankings as SEOs require better search engine understanding. Thus, understanding the methods of query processing is necessary.
Search Query Processing is implementing the query processing for thesearch engines. Search query refers to the phrase that search engine users use for searching. Search intent understanding and search intent grouping are two different things. But, query templates, questions templates, and document templates work together. Search query is for organic search behaviors. A web search engine answers millions of queries every day. Search query processing is a fundamental task for search engine optimization and search engine result page optimization.
The "Semantic Search Engine: Query Processing" slides from Koray TuÄberk GĂBĂR supported the presentation of "Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO". The presentation has been created by Dear Rebecca Berbel.
Many thanks to the Google engineers that created the Semantic Search Engine patents including Larry Page.
The Apache Solr Semantic Knowledge GraphTrey Grainger
Â
What if instead of a query returning documents, you could alternatively return other keywords most related to the query: i.e. given a search for "data science", return me back results like "machine learning", "predictive modeling", "artificial neural networks", etc.? Solrâs Semantic Knowledge Graph does just that. It leverages the inverted index to automatically model the significance of relationships between every term in the inverted index (even across multiple fields) allowing real-time traversal and ranking of any relationship within your documents. Use cases for the Semantic Knowledge Graph include disambiguation of multiple meanings of terms (does "driver" mean truck driver, printer driver, a type of golf club, etc.), searching on vectors of related keywords to form a conceptual search (versus just a text match), powering recommendation algorithms, ranking lists of keywords based upon conceptual cohesion to reduce noise, summarizing documents by extracting their most significant terms, and numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. In this talk, we'll do a deep dive into the internals of how the Semantic Knowledge Graph works and will walk you through how to get up and running with an example dataset to explore the meaningful relationships hidden within your data.
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Koray Tugberk GUBUR
Â
Semantic Search Engines can understand human language to analyze the need behind a query. Instead of focusing, string, or word matching, a semantic search engine focuses on concepts, intents, and relations of named entities. Taxonomy, ontology, onomastics, semantic role labeling, relation detection, lexical semantics, entity extraction, recognition, resolution can be used by semantic search engines. In this PDF file, semantic search engines' evolution will be processed based on Google Search Engine's research papers, patents, and official announcements. From 1998 to 20021, search's and search engines' evolution, from strings to things, from phrases to entities will be told along with query processing, and parsing methodology changes.
As opposed to lexical search, semantic searching searches for meaning, not meaningless matches of the query words. Semantic search attempts to increase the relevancy of results by understanding searchers' intents and the context of terms in the searchable dataspace, whether online or within a closed system. The right semantic search content is a blend of natural language, focuses on the intent of the user, and considers other topics the user may be interested in.
Ontologies, XML, and other structured data sources can be used to retrieve knowledge using semantic search according to some authors. The use of such technologies provides a mechanism for creating formal expressions of domain knowledge that are highly expressive and may allow the user to express more detailed intent during query processing.
Natural Language Search with Knowledge Graphs (Activate 2019)Trey Grainger
Â
To optimally interpret most natural language queries, its important to understand a highly-nuanced, contextual interpretation of the domain-specific phrases, entities, commands, and relationships represented or implied within the search and within your domain.
In this talk, we'll walk through such a search system powered by Solr's Text Tagger and Semantic Knowledge graph. We'll have fun with some of the more search-centric use cases of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "best bbq near activate" into:
{!func}mul(min(popularity,1),100) bbq^0.91032 ribs^0.65674 brisket^0.63386 doc_type:"restaurant" {!geofilt d=50 sfield="coordinates_pt" pt="38.916120,-77.045220"}
We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding like this within your search engine.
A pipeline of reading, parsing, optimizing, and storing a log file to parquet.
This script uses the Python pandas library, utilizing the efficient Apache Parquet format for a big speed up and efficient storage.
Google Lighthouse is super valuable but it only checks one page at a time.
Hamlet will show you how to get it to check all pages of a site, and how to run automated Lighthouse checks on-demand at scheduled intervals and from automated tests.
He'll also cover how to set performance budgets, how to get alerts when budgets are exceeded, and how to aggregate page reports using BigQuery and Google Data Studio.
How to approach SEO in a world where Google has moved from strings and keywords to things, topics and entities. Dixon JOnes is the CEO of InLinks, who have build a proprietory NLP algorithm and Knowledge Graph designed for the SEO Industry.
Coronavirus and Future of SEO: Digital Marketing and Remote CultureKoray Tugberk GUBUR
Â
I have attended a great SEO and Digital Marketing webinar with Founder of Stradiji and SEMRush Turkey Lead Mr. Mert Erkal and My Dearest Friend and SEO Consultant Atakan ErdoÄan.
Small Note: After I uploaded the presentation, Google launched a new Covid-19 news address like Bing/covid-19. You may want to look at it -> https://www.google.com/covid-19
I have prepared a Presentation about Coronavirus's Effects on Search Engine Optimization (SEO).
You will find Coronavirus's changing effects on Digital Marketing and psychology of global society while using Search Engines.
I also have focused on Search Engine's and Social Media Brands, E-commerce Site's reflexes against Coronavirus Pandemic.
You will see the web sites and categories who earn more traffic and lose traffic. You will also see conversion rate differences because of Coronavirus.
Also, I have told about Search Engine's differences and their attitude against the Coronavirus Pandemic, their future, their updates during the pandemic.
In the last part, you will see some new 2020 Web Technology and Design Trends with AI.
There are also Google Researches for better Search Engine technologies.
Questions:
1- What are the differences between Yandex, Google, Bing, and Duckduckgo for Coronavirus Pandemic?
2- Twitter, Instagram, Amazon or Apple, what are they doing?
3- What do people search most for during the Coronavirus Crisis?
4- What changes from country to country?
5- What are the future technologies of Web and App?
6- How and why do Search Engines improve AI, what is the last events?
7- Which sites loose traffic and which earn more?
8- Lots of quotes from International SEOs about the pandemic.
And more...
I am Koray TuÄberk GĂBĂR and a Holistic SEO Expert.
I sincerely thank you for my Dearest Friend Atakan ErdoÄan and Mr. Mert Erkal for this awesome webinar opportunity and experience.
To watch the webinar, please visit Stradiji's Official Youtube Channel.
https://www.youtube.com/watch?v=V4sJTNcRqaM&t=100s
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
Â
Search engines frequently miss the mark when it comes to understanding user intent. This talk will describe how to overcome this by leveraging Lucene/Solr to power a knowledge graph that can extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships. For example, if a user types in (Senior Java Developer Portland, OR Hadoop), you or I know that the term âseniorâ designates an experience level, that âjava developerâ is a job title related to âsoftware engineeringâ, that âportland, orâ is a city with a specific geographical boundary, and that âhadoopâ is a technology related to terms like âhbaseâ, âhiveâ, and âmap/reduceâ. Out of the box, however, most search engines just parse this query as text:((senior AND java AND developer AND portland) OR (hadoop)), which is not at all what the user intended. We will discuss how to train the search engine to parse the query into this intended understanding, and how to reflect this understanding to the end user to provide an insightful, augmented search experience. Topics: Semantic Search, Finite State Transducers, Probabilistic Parsing, Bayes Theorem, Augmented Search, Recommendations, NLP, Knowledge Graphs
SEO Case Study - Hangikredi.com From 12 March to 24 September Core UpdateKoray Tugberk GUBUR
Â
Start Summary:
"131% Organic Session Increase in 5 Months
62% Impression Increase in 5 Months
144% Clicks Increase in 5 Months"
This SEO Case study is about Google Core Updates and their impacts on biggest financial institution website in Turkey.
I have started to work in Hangikredi.com at 26 March 2019. But, the company's website had been affected by 12 March Google Core Update very negatively.
I had started to work in here while a crisis had been happening.
I had examined the web site and figured it out that the real problems were crawl budget, authority signasl and relevancy-entity connection. I have activated social media, Google My Bussiness accounts, I have entered financial forums, every other alternative channel. I created a news publisher network about us. I cleaned the misleading status codes, HTML and CSS mistakes, optimised meta tags, fixed the redirection chains, I used the image compressions and deleted lots of unnecessary URL and their contents, I created the internal link structure from scratch.
Until 5 June Google Core Update, we were winners again.
We had regained all of our traffic lost. Until 1 August server atack, we were okay, then in one day, everything went wrong.
I had started from 0 again...
I had been optimising web site's offpage signals for regain the trust of Google AI and I had been supporting this strategy with onpage elements.
After 24th September Google Core Update, there was another success. We breaked the crawl load/rate record, avarage site position, CTR and impression, click records for site history.
In this CASE Study, you are gonna find details of a SEO Success Story with graphics and also some funny cencor images from my life.
End Summary:
"12 March, 5 june and 24 September Google Core Updates with 1 August Server Atack are the milestones of this SEO Casse Study. You will find all details from our view of point. I hope you will like it."
Presentation of the Semantic Knowledge Graph research paper at the 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (Montreal, Canada - October 18th, 2016)
AbstractâThis paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...Dawn Anderson MSc DigM
Â
There are a lot of myths, facts and theories on crawl budget and the term is bandied around a lot. This deck looks to address some of those myths and also looks at some additional theories around the concepts of 'crawl rank' and 'search engine embarrassment'.
Whilst passage indexing may seem like a small tweak to search ranking, it is potentially much more symptomatic of the beginning of a fundamental shift in the way that search engines understand unstructured content, determine relevance in natural language, and rank efficiently and effectively.
It could also be a means of assessing overall quality of content and a means of dynamic index pruning. We will look at the landscape, and also provide some takeaways for brands and business owners looking to improve quality in unstructured content overall in this fast changing landscape.
BrightonSEO March 2021 | Dan Taylor, Image Entity TagsDan Taylor
Â
My talk from BrightonSEO 2021; focusing on using Google's image category labels (glancing into the Knowledge Graph and Google's image annotation processes) for better topic research and content optimization.
From knowledge graph to commonsense knowledge graph brighton seoDateme Tubotamuno
Â
Knowledge Graphs have vastly improved search in the age of the semantic web. Here comes Commonsense Knowledge Graphs. A more dynamic version of todayâs knowledge graph.
The Next Generation of AI-powered SearchTrey Grainger
Â
What does it really mean to deliver an "AI-powered Search" solution? In this talk, weâll bring clarity to this topic, showing you how to marry the art of the possible with the real-world challenges involved in understanding your content, your users, and your domain. We'll dive into emerging trends in AI-powered Search, as well as many of the stumbling blocks found in even the most advanced AI and Search applications, showing how to proactively plan for and avoid them. We'll walk through the various uses of reflected intelligence and feedback loops for continuous learning from user behavioral signals and content updates, also covering the increasing importance of virtual assistants and personalized search use cases found within the intersection of traditional search and recommendation engines. Our goal will be to provide a baseline of mainstream AI-powered Search capabilities available today, and to paint a picture of what we can all expect just on the horizon.
The Next Generation of AI-Powered SearchLucidworks
Â
Trey Grainger, Chief Algorithms Officer, at Lucidworks delivers the closing keynote for ACTIVATE 2019, the Search and AI Conference hosted by Lucidworks.
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEOKoray Tugberk GUBUR
Â
Query Processing is the process of query term weight calculation, query augmentation, query context defining, and more. Query understanding and Query clustering are related to Information Retrieval tasks for the search engines. To provide a better search engine optimization effort and project result, the organic search performance optimizers need to implement query processing methodologies. Digital marketing and SEO are connected to each other. Understanding a query includes query parsing, query rewriting, question generation, and answer pairing. Multi-stages Query Processing, Candidate Answer Passages, or Candidate Answer Passages and Answer Term Weighting are some of the concepts from the Google Search Engine to parse the queries.
The presentation of The Secret Life of Queries, Parsing, Rewriting & SEO has been presented at the Brighton SEO Event in April 2022. The event speech focused on explaining the theoretical SEO and practical SEO examples together.
Query Processing methodologies are beyond synonym matching or synonym finding. It involves multiple aspects of the words, and meanings of the words. The theme of words, the centrality of words, attention windows, context windows, and word co-occurrence matrices, GloVe, Word2Vec, word embeddings, character embeddings, and more.
Themes of words contain the word probability like in Continues Bag of Window.
The search engine optimization community focuses on keyword research by matching the queries. Query processing involves query word order change, query word type change, query word combination change, query phrase synonym usage, query question generation, query clustering. Query processing and document processing are correlational. Query processing is to understand a query while document processing is to process a web document. Both of the processes are for ranking algorithms. Providing a better ranking algorithm requires a better query understanding. And providing better rankings as SEOs require better search engine understanding. Thus, understanding the methods of query processing is necessary.
Search Query Processing is implementing the query processing for thesearch engines. Search query refers to the phrase that search engine users use for searching. Search intent understanding and search intent grouping are two different things. But, query templates, questions templates, and document templates work together. Search query is for organic search behaviors. A web search engine answers millions of queries every day. Search query processing is a fundamental task for search engine optimization and search engine result page optimization.
The "Semantic Search Engine: Query Processing" slides from Koray TuÄberk GĂBĂR supported the presentation of "Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO". The presentation has been created by Dear Rebecca Berbel.
Many thanks to the Google engineers that created the Semantic Search Engine patents including Larry Page.
The Apache Solr Semantic Knowledge GraphTrey Grainger
Â
What if instead of a query returning documents, you could alternatively return other keywords most related to the query: i.e. given a search for "data science", return me back results like "machine learning", "predictive modeling", "artificial neural networks", etc.? Solrâs Semantic Knowledge Graph does just that. It leverages the inverted index to automatically model the significance of relationships between every term in the inverted index (even across multiple fields) allowing real-time traversal and ranking of any relationship within your documents. Use cases for the Semantic Knowledge Graph include disambiguation of multiple meanings of terms (does "driver" mean truck driver, printer driver, a type of golf club, etc.), searching on vectors of related keywords to form a conceptual search (versus just a text match), powering recommendation algorithms, ranking lists of keywords based upon conceptual cohesion to reduce noise, summarizing documents by extracting their most significant terms, and numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. In this talk, we'll do a deep dive into the internals of how the Semantic Knowledge Graph works and will walk you through how to get up and running with an example dataset to explore the meaningful relationships hidden within your data.
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Koray Tugberk GUBUR
Â
Semantic Search Engines can understand human language to analyze the need behind a query. Instead of focusing, string, or word matching, a semantic search engine focuses on concepts, intents, and relations of named entities. Taxonomy, ontology, onomastics, semantic role labeling, relation detection, lexical semantics, entity extraction, recognition, resolution can be used by semantic search engines. In this PDF file, semantic search engines' evolution will be processed based on Google Search Engine's research papers, patents, and official announcements. From 1998 to 20021, search's and search engines' evolution, from strings to things, from phrases to entities will be told along with query processing, and parsing methodology changes.
As opposed to lexical search, semantic searching searches for meaning, not meaningless matches of the query words. Semantic search attempts to increase the relevancy of results by understanding searchers' intents and the context of terms in the searchable dataspace, whether online or within a closed system. The right semantic search content is a blend of natural language, focuses on the intent of the user, and considers other topics the user may be interested in.
Ontologies, XML, and other structured data sources can be used to retrieve knowledge using semantic search according to some authors. The use of such technologies provides a mechanism for creating formal expressions of domain knowledge that are highly expressive and may allow the user to express more detailed intent during query processing.
Natural Language Search with Knowledge Graphs (Activate 2019)Trey Grainger
Â
To optimally interpret most natural language queries, its important to understand a highly-nuanced, contextual interpretation of the domain-specific phrases, entities, commands, and relationships represented or implied within the search and within your domain.
In this talk, we'll walk through such a search system powered by Solr's Text Tagger and Semantic Knowledge graph. We'll have fun with some of the more search-centric use cases of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "best bbq near activate" into:
{!func}mul(min(popularity,1),100) bbq^0.91032 ribs^0.65674 brisket^0.63386 doc_type:"restaurant" {!geofilt d=50 sfield="coordinates_pt" pt="38.916120,-77.045220"}
We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding like this within your search engine.
A pipeline of reading, parsing, optimizing, and storing a log file to parquet.
This script uses the Python pandas library, utilizing the efficient Apache Parquet format for a big speed up and efficient storage.
Google Lighthouse is super valuable but it only checks one page at a time.
Hamlet will show you how to get it to check all pages of a site, and how to run automated Lighthouse checks on-demand at scheduled intervals and from automated tests.
He'll also cover how to set performance budgets, how to get alerts when budgets are exceeded, and how to aggregate page reports using BigQuery and Google Data Studio.
How to approach SEO in a world where Google has moved from strings and keywords to things, topics and entities. Dixon JOnes is the CEO of InLinks, who have build a proprietory NLP algorithm and Knowledge Graph designed for the SEO Industry.
Coronavirus and Future of SEO: Digital Marketing and Remote CultureKoray Tugberk GUBUR
Â
I have attended a great SEO and Digital Marketing webinar with Founder of Stradiji and SEMRush Turkey Lead Mr. Mert Erkal and My Dearest Friend and SEO Consultant Atakan ErdoÄan.
Small Note: After I uploaded the presentation, Google launched a new Covid-19 news address like Bing/covid-19. You may want to look at it -> https://www.google.com/covid-19
I have prepared a Presentation about Coronavirus's Effects on Search Engine Optimization (SEO).
You will find Coronavirus's changing effects on Digital Marketing and psychology of global society while using Search Engines.
I also have focused on Search Engine's and Social Media Brands, E-commerce Site's reflexes against Coronavirus Pandemic.
You will see the web sites and categories who earn more traffic and lose traffic. You will also see conversion rate differences because of Coronavirus.
Also, I have told about Search Engine's differences and their attitude against the Coronavirus Pandemic, their future, their updates during the pandemic.
In the last part, you will see some new 2020 Web Technology and Design Trends with AI.
There are also Google Researches for better Search Engine technologies.
Questions:
1- What are the differences between Yandex, Google, Bing, and Duckduckgo for Coronavirus Pandemic?
2- Twitter, Instagram, Amazon or Apple, what are they doing?
3- What do people search most for during the Coronavirus Crisis?
4- What changes from country to country?
5- What are the future technologies of Web and App?
6- How and why do Search Engines improve AI, what is the last events?
7- Which sites loose traffic and which earn more?
8- Lots of quotes from International SEOs about the pandemic.
And more...
I am Koray TuÄberk GĂBĂR and a Holistic SEO Expert.
I sincerely thank you for my Dearest Friend Atakan ErdoÄan and Mr. Mert Erkal for this awesome webinar opportunity and experience.
To watch the webinar, please visit Stradiji's Official Youtube Channel.
https://www.youtube.com/watch?v=V4sJTNcRqaM&t=100s
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
Â
Search engines frequently miss the mark when it comes to understanding user intent. This talk will describe how to overcome this by leveraging Lucene/Solr to power a knowledge graph that can extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships. For example, if a user types in (Senior Java Developer Portland, OR Hadoop), you or I know that the term âseniorâ designates an experience level, that âjava developerâ is a job title related to âsoftware engineeringâ, that âportland, orâ is a city with a specific geographical boundary, and that âhadoopâ is a technology related to terms like âhbaseâ, âhiveâ, and âmap/reduceâ. Out of the box, however, most search engines just parse this query as text:((senior AND java AND developer AND portland) OR (hadoop)), which is not at all what the user intended. We will discuss how to train the search engine to parse the query into this intended understanding, and how to reflect this understanding to the end user to provide an insightful, augmented search experience. Topics: Semantic Search, Finite State Transducers, Probabilistic Parsing, Bayes Theorem, Augmented Search, Recommendations, NLP, Knowledge Graphs
SEO Case Study - Hangikredi.com From 12 March to 24 September Core UpdateKoray Tugberk GUBUR
Â
Start Summary:
"131% Organic Session Increase in 5 Months
62% Impression Increase in 5 Months
144% Clicks Increase in 5 Months"
This SEO Case study is about Google Core Updates and their impacts on biggest financial institution website in Turkey.
I have started to work in Hangikredi.com at 26 March 2019. But, the company's website had been affected by 12 March Google Core Update very negatively.
I had started to work in here while a crisis had been happening.
I had examined the web site and figured it out that the real problems were crawl budget, authority signasl and relevancy-entity connection. I have activated social media, Google My Bussiness accounts, I have entered financial forums, every other alternative channel. I created a news publisher network about us. I cleaned the misleading status codes, HTML and CSS mistakes, optimised meta tags, fixed the redirection chains, I used the image compressions and deleted lots of unnecessary URL and their contents, I created the internal link structure from scratch.
Until 5 June Google Core Update, we were winners again.
We had regained all of our traffic lost. Until 1 August server atack, we were okay, then in one day, everything went wrong.
I had started from 0 again...
I had been optimising web site's offpage signals for regain the trust of Google AI and I had been supporting this strategy with onpage elements.
After 24th September Google Core Update, there was another success. We breaked the crawl load/rate record, avarage site position, CTR and impression, click records for site history.
In this CASE Study, you are gonna find details of a SEO Success Story with graphics and also some funny cencor images from my life.
End Summary:
"12 March, 5 june and 24 September Google Core Updates with 1 August Server Atack are the milestones of this SEO Casse Study. You will find all details from our view of point. I hope you will like it."
Presentation of the Semantic Knowledge Graph research paper at the 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (Montreal, Canada - October 18th, 2016)
AbstractâThis paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...Dawn Anderson MSc DigM
Â
There are a lot of myths, facts and theories on crawl budget and the term is bandied around a lot. This deck looks to address some of those myths and also looks at some additional theories around the concepts of 'crawl rank' and 'search engine embarrassment'.
Whilst passage indexing may seem like a small tweak to search ranking, it is potentially much more symptomatic of the beginning of a fundamental shift in the way that search engines understand unstructured content, determine relevance in natural language, and rank efficiently and effectively.
It could also be a means of assessing overall quality of content and a means of dynamic index pruning. We will look at the landscape, and also provide some takeaways for brands and business owners looking to improve quality in unstructured content overall in this fast changing landscape.
BrightonSEO March 2021 | Dan Taylor, Image Entity TagsDan Taylor
Â
My talk from BrightonSEO 2021; focusing on using Google's image category labels (glancing into the Knowledge Graph and Google's image annotation processes) for better topic research and content optimization.
From knowledge graph to commonsense knowledge graph brighton seoDateme Tubotamuno
Â
Knowledge Graphs have vastly improved search in the age of the semantic web. Here comes Commonsense Knowledge Graphs. A more dynamic version of todayâs knowledge graph.
The Next Generation of AI-powered SearchTrey Grainger
Â
What does it really mean to deliver an "AI-powered Search" solution? In this talk, weâll bring clarity to this topic, showing you how to marry the art of the possible with the real-world challenges involved in understanding your content, your users, and your domain. We'll dive into emerging trends in AI-powered Search, as well as many of the stumbling blocks found in even the most advanced AI and Search applications, showing how to proactively plan for and avoid them. We'll walk through the various uses of reflected intelligence and feedback loops for continuous learning from user behavioral signals and content updates, also covering the increasing importance of virtual assistants and personalized search use cases found within the intersection of traditional search and recommendation engines. Our goal will be to provide a baseline of mainstream AI-powered Search capabilities available today, and to paint a picture of what we can all expect just on the horizon.
The Next Generation of AI-Powered SearchLucidworks
Â
Trey Grainger, Chief Algorithms Officer, at Lucidworks delivers the closing keynote for ACTIVATE 2019, the Search and AI Conference hosted by Lucidworks.
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
Â
This presentation is from the inaugural Atlanta Solr Meetup held on 2014/10/21 at Atlanta Tech Village.
Description: CareerBuilder uses Solr to power their recommendation engine, semantic search, and data analytics products. They maintain an infrastructure of hundreds of Solr servers, holding over a billion documents and serving over a million queries an hour across thousands of unique search indexes. Come learn how CareerBuilder has integrated Solr into their technology platform (with assistance from Hadoop, Cassandra, and RabbitMQ) and walk through api and code examples to see how you can use Solr to implement your own real-time recommendation engine, semantic search, and data analytics solutions.
Speaker: Trey Grainger is the Director of Engineering for Search & Analytics at CareerBuilder.com and is the co-author of Solr in Action (2014, Manning Publications), the comprehensive example-driven guide to Apache Solr. His search experience includes handling multi-lingual content across dozens of markets/languages, machine learning, semantic search, big data analytics, customized Lucene/Solr scoring models, data mining and recommendation systems. Trey is also the Founder of Celiaccess.com, a gluten-free search engine, and is a frequent speaker at Lucene and Solr-related conferences.
Reflected intelligence evolving self-learning data systemsTrey Grainger
Â
In this presentation, weâll talk about evolving self-learning search and recommendation systems which are able to accept user queries, deliver relevance-ranked results, and iteratively learn from the usersâ subsequent interactions to continually deliver a more relevant experience. Such a self-learning system leverages reflected intelligence to consistently improve its understanding of the content (documents and queries), the context of specific users, and the collective feedback from all prior user interactions with the system. Through iterative feedback loops, such a system can leverage user interactions to learn the meaning of important phrases and topics within a domain, identify alternate spellings and disambiguate multiple meanings of those phrases, learn the conceptual relationships between phrases, and even learn the relative importance of features to automatically optimize its own ranking algorithms on a per-query, per-category, or per-user/group basis.
Text Mining & Sentiment Analysis made easy, with Azure and Power BISanil Mhatre
Â
Does your Enterprise data include social media posts, product reviews and survey results with free form text ? Feel like you need a Data Scientist, equipped with Tools like R, text mining & sentiment lexicon to unlock the value of that data ? Not anymore ! Thanks to our friends at Microsoft, a developer or analyst can leverage Azure Cognitive services and PowerBI to easily analyze text data and create effective visual dashboards.
This session will walk through how to use text analytics APIs in Azure cognitive services, to analyze free form text responses to survey questions . We will learn how to parse key phrases, derive sentiment scores and how to effectively use Power BI visualizations like Word Cloud, Gauge, Histograms , etc to create a Dashboard for effectively analyzing this data. We will browse through performing similar analysis in R.
This tutorial gives an overview of how search engines and machine learning techniques can be tightly coupled to address the need for building scalable recommender or other prediction based systems. Typically, most of them architect retrieval and prediction in two phases. In Phase I, a search engine returns the top-k results based on constraints expressed as a query. In Phase II, the top-k results are re-ranked in another system according to an optimization function that uses a supervised trained model. However this approach presents several issues, such as the possibility of returning sub-optimal results due to the top-k limits during query, as well as the prescence of some inefficiencies in the system due to the decoupling of retrieval and ranking.
To address this issue the authors created ML-Scoring, an open source framework that tightly integrates machine learning models into Elasticsearch, a popular search engine. ML-Scoring replaces the default information retrieval ranking function with a custom supervised model that is trained through Spark, Weka, or R that is loaded as a plugin in Elasticsearch. This tutorial will not only review basic methods in information retrieval and machine learning, but it will also walk through practical examples from loading a dataset into Elasticsearch to training a model in Spark, Weka, or R, to creating the ML-Scoring plugin for Elasticsearch. No prior experience is required in any system listed (Elasticsearch, Spark, Weka, R), though some programming experience is recommended.
RecSys 2015 Tutorial â Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
Â
Search engines have focused on solving the document retrieval problem, so their scoring functions do not handle naturally non-traditional IR data types, such as numerical or categorical. Therefore, on domains beyond traditional search, scores representing strengths of associations or matches may vary widely. As such, the original model doesnât suffice, so relevance ranking is performed as a two-phase approach with 1) regular search 2) external model to re-rank the filtered items. Metrics such as click-through and conversion rates are associated with the usersâ response to items served. The predicted selection rates that arise in real-time can be critical for optimal matching. For example, in recommender systems, predicted performance of a recommended item in a given context, also called response prediction, is often used in determining a set of recommendations to serve in relation to a given serving opportunity. Similar techniques are used in the advertising domain. To address this issue the authors have created ML-Scoring, an open source framework that tightly integrates machine learning models into a popular search engine (SOLR/Elasticsearch), replacing the default IR-based ranking function. A custom model is trained through either Weka or Spark and it is loaded as a plugin used at query time to compute custom scores.
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive SearchBrian McKeiver
Â
In person at THAT Conference 2021 - How to add AI / machine Learning to your website search through Azure Cognitive Services with it's brand new semantic search. Join the session to why semantic AI-powered search improves the quality of search results.
Slides for VU Web Technology course lecture on "Search on the Web". Explaining how search engines work, some basic information laws and inverted indices.
How to Build a Recommendation Engine on SparkCaserta
Â
How to Build a Recommendation Engine on Spark was a presentation given by Joe Caserta, CEO and founder of Caserta Concepts, at @AnalyticsWeek in Boston.
Boston's Data AnalyticsStreet Conference is a 2 day packed event with thought provoking keynotes, knowledge filled sessions, intense workshops, insightful panels, and real-world case studies - engaging analytics community with latest methodologies and trends. The conference encompasses largest Speaker-to-Attendee ratio for unmatched networking and learning opportunity.
For more information on the services and solutions Caserta Concepts offers, visit our website at http://casertaconcepts.com/.
PostgreSQL - It's kind've a nifty databaseBarry Jones
Â
This presentation was given to a company that makes software for churches that is considering a migration from SQL Server to PostgreSQL. It was designed to give a broad overview of features in PostgreSQL with an emphasis on full-text search, various datatypes like hstore, array, xml, json as well as custom datatypes, TOAST compression and a taste of other interesting features worth following up on.
Leveraging NLP and Deep Learning for Document Recommendations in the CloudDatabricks
Â
Efficient recommender systems are critical for the success of many industries, such as job recommendation, news recommendation, ecommerce, etc. This talk will illustrate how to build an efficient document recommender system by leveraging Natural Language Processing(NLP) and Deep Neural Networks (DNNs). The end-to-end flow of the document recommender system is build on AWS at scale, using Analytics Zoo for Spark and BigDL. The system first processes text rich documents into embeddings by incorporating Global Vectors (GloVe), then trains a K-means model using native Spark APIs to cluster users into several groups. The system further trains a recommender model for each group, and gives an ensemble prediction for each test record. By adopting the end-to-end pipeline of Analytics Zoo solution, we saw about 10% improvement of mean reciprocal ranking and 6% of precision respectively compared to the search recommendations for a job recommendation study.
Speaker: Guoqiong Song
This presentation was given in one of the DSATL Mettups in March 2018 in partnership with Southern Data Science Conference 2018 (www.southerndatascience.com)
Similar to Thought Vectors and Knowledge Graphs in AI-powered Search (20)
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
Â
The goal of most digital transformations is to create competitive advantage by enhancing customer experience and employee success, so giving these stakeholders the ability to find the right information at their moment of need is paramount. Employees and customers increasingly expect an intuitive, interactive experience where they can simply type or speak their questions or keywords into a search box, their intent will be understood, and the best answers and content are then immediately presented.
Providing this compelling experience, however, requires a deep understanding of your content, your unique business domain, and the collective and personalized needs of each of your users. Modern artificial intelligence (AI) approaches are able to continuously learn from both your content and the ongoing stream of user interactions with your applications, and to automatically reflect back that learned intelligence in order to instantly and scalably deliver contextually-relevant answers to employees and customers.
In this talk, we'll discuss how AI is currently being deployed across the Fortune 1000 to accomplish these goals, both in the digital workplace (helping employees more efficiently get answers and make decisions) and in digital commerce (understanding customer intent and connecting them with the best information and products). We'll separate fact from fiction as we break down the hype around AI and show how it is being practically implemented today to power many real-world digital transformations for the next generation of employees and customers.
Natural Language Search with Knowledge Graphs (Chicago Meetup)Trey Grainger
Â
To optimally interpret most natural language queries, its important to understand a highly-nuanced, contextual interpretation of the domain-specific phrases, entities, commands, and relationships represented or implied within the search and within your domain.
In this talk, we'll walk through such a search system powered by Solr's Text Tagger and Semantic Knowledge graph. We'll have fun with some of the more search-centric use cases of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "best bbq near activate" into:
{!func}mul(min(popularity,1),100) bbq^0.91032 ribs^0.65674 brisket^0.63386 doc_type:"restaurant" {!geofilt d=50 sfield="coordinates_pt" pt="38.916120,-77.045220"}
We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding like this within your search engine.
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
Â
To optimally interpret most natural language queries, it is necessary to understand the phrases, entities, commands, and relationships represented or implied within the search. Knowledge graphs serve as useful instantiations of ontologies which can help represent this kind of knowledge within a domain.
In this talk, we'll walk through techniques to build knowledge graphs automatically from your own domain-specific content, how you can update and edit the nodes and relationships, and how you can seamlessly integrate them into your search solution for enhanced query interpretation and semantic search. We'll have some fun with some of the more search-centric use cased of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "bbq near haystack" into
{ filter:["doc_type":"restaurant"], "query": { "boost": { "b": "recip(geodist(38.034780,-78.486790),1,1000,1000)", "query": "bbq OR barbeque OR barbecue" } } }
We'll also specifically cover use of the Semantic Knowledge Graph, a particularly interesting knowledge graph implementation available within Apache Solr that can be auto-generated from your own domain-specific content and which provides highly-nuanced, contextual interpretation of all of the terms, phrases and entities within your domain. We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding within your search engine.
Closing keynote by Trey Grainger from Activate 2018 in Montreal, Canada. Covers trends in the intersection of Search (Information Retrieval) and Artificial Intelligence, and the underlying capabilities needed to deliver those trends at scale.
The Relevance of the Apache Solr Semantic Knowledge GraphTrey Grainger
Â
The Semantic Knowledge Graph is an Apache Solr plugin that can be used to discover and rank the relationships between any arbitrary queries or terms within the search index. It is a relevancy swiss army knife, able to discover related terms and concepts, disambiguate different meanings of terms given their context, cleanup noise in datasets, discover previously unknown relationships between entities across documents and fields, rank lists of keywords based upon conceptual cohesion to reduce noise, summarize documents by extracting their most significant terms, generate recommendations and personalized search, and power numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. This talk will walk you through how to setup and use this plugin in concert with other open source tools (probabilistic query parser, SolrTextTagger for entity extraction) to parse, interpret, and much more correctly model the true intent of user searches than traditional keyword-based search approaches.
"Searching for Meaning: The Hidden Structure in Unstructured Data". Presentation by Trey Grainger at the Southern Data Science Conference (SDSC) 2018. Covers linguistic theory, application in search and information retrieval, and knowledge graph and ontology learning methods for automatically deriving contextualized meaning from unstructured (free text) content.
Building Search & Recommendation EnginesTrey Grainger
Â
In this talk, you'll learn how to build your own search and recommendation engine based on the open source Apache Lucene/Solr project. We'll dive into some of the data science behind how search engines work, covering multi-lingual text analysis, natural language processing, relevancy ranking algorithms, knowledge graphs, reflected intelligence, collaborative filtering, and other machine learning techniques used to drive relevant results for free-text queries. We'll also demonstrate how to build a recommendation engine leveraging the same platform and techniques that power search for most of the world's top companies. You'll walk away from this presentation with the toolbox you need to go and implement your very own search-based product using your own data.
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
Â
Search engines, recommendation systems, advertising networks, and even data analytics tools all share the same end goal - to deliver the most relevant information possible to meet a given information need (usually in real-time). Perfecting these systems requires algorithms which can build a deep understanding of the domains represented by the underlying data, understand the nuanced ways in which words and phrases should be parsed and interpreted within different contexts, score the relationships between arbitrary phrases and concepts, continually learn from users' context and interactions to make the system smarter, and generate custom models of personalized tastes for each user of the system.
In this talk, we'll dive into both the philosophical questions associated with such systems ("how do you accurately represent and interpret the meaning of words?", "How do you prevent filter bubbles?", etc.), as well as look at practical examples of how these systems have been successfully implemented in production systems combining a variety of available commercial and open source components (inverted indexes, entity extraction, similarity scoring and machine-learned ranking, auto-generated knowledge graphs, phrase interpretation and concept expansion, etc.).
Self-learned Relevancy with Apache SolrTrey Grainger
Â
Search engines are known for "relevancy", but the relevancy models that ship out of the box (BM25, classic tf-idf, etc.) are just scratching the surface of what's needed for a truly insightful application.
What if your search engine could automatically tune its own domain-specific relevancy model based on user interactions? What if it could learn the important phrases and topics within your domain, learn the conceptual relationships embedded within your documents, and even use machine-learned ranking to discover the relative importance of different features and then automatically optimize its own ranking algorithms for your domain? What if you could further use SQL queries to explore these relationships within your own BI tools and return results in ranked order to deliver relevance-driven analytics visualizations?
In this presentation, we'll walk through how you can leverage the myriad of capabilities in the Apache Solr ecosystem (such as the Solr Text Tagger, Semantic Knowledge Graph, Spark-Solr, Solr SQL, learning to rank, probabilistic query parsing, and Lucidworks Fusion) to build self-learning, relevance-first search, recommendations, and data analytics applications.
Search engines, and Apache Solr in particular, are quickly shifting the focus away from âbig dataâ systems storing massive amounts of raw (but largely unharnessed) content, to âsmart dataâ systems where the most relevant and actionable content is quickly surfaced instead. Apache Solr is the blazing-fast and fault-tolerant distributed search engine leveraged by 90% of Fortune 500 companies. As a community-driven open source project, Solr brings in diverse contributions from many of the top companies in the world, particularly those for whom returning the most relevant results is mission critical.
Out of the box, Solr includes advanced capabilities like learning to rank (machine-learned ranking), graph queries and distributed graph traversals, job scheduling for processing batch and streaming data workloads, the ability to build and deploy machine learning models, and a wide variety of query parsers and functions allowing you to very easily build highly relevant and domain-specific semantic search, recommendations, or personalized search experiences. These days, Solr even enables you to run SQL queries directly against it, mixing and matching the full power of Solrâs free-text, geospatial, and other search capabilities with the a prominent query language already known by most developers (and which many external systems can use to query Solr directly).
Due to the community-oriented nature of Solr, the ecosystem of capabilities also spans well beyond just the core project. In this talk, weâll also cover several other projects within the larger Apache Lucene/Solr ecosystem that further enhance Solrâs smart data capabilities: bi-directional integration of Apache Spark and Solrâs capabilities, large-scale entity extraction, semantic knowledge graphs for discovering, traversing, and scoring meaningful relationships within your data, auto-generation of domain-specific ontologies, running SPARQL queries against Solr on RDF triples, probabilistic identification of key phrases within a query or document, conceptual search leveraging Word2Vec, and even Lucidworksâ own Fusion project which extends Solr to provide an enterprise-ready smart data platform out of the box.
Weâll dive into how all of these capabilities can fit within your data science toolbox, and youâll come away with a really good feel for how to build highly relevant âsmart dataâ applications leveraging these key technologies.
South Big Data Hub: Text Data Analysis PanelTrey Grainger
Â
Slides from Trey's opening presentation for the South Big Data Hub's Text Data Analysis Panel on December 8th, 2016. Trey provided a quick introduction to Apache Solr, described how companies are using Solr to power relevant search in industry, and provided a glimpse on where the industry is heading with regard to implementing more intelligent and relevant semantic search.
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
Â
What if your search engine could automatically tune its own domain-specific relevancy model? What if it could learn the important phrases and topics within your domain, automatically identify alternate spellings (synonyms, acronyms, and related phrases) and disambiguate multiple meanings of those phrases, learn the conceptual relationships embedded within your documents, and even use machine-learned ranking to discover the relative importance of different features and then automatically optimize its own ranking algorithms for your domain?
In this presentation, youâll learn you how to do just that - to evolving Lucene/Solr implementations into self-learning data systems which are able to accept user queries, deliver relevance-ranked results, and automatically learn from your usersâ subsequent interactions to continually deliver a more relevant experience for each keyword, category, and group of users.
Such a self-learning system leverages reflected intelligence to consistently improve its understanding of the content (documents and queries), the context of specific users, and the relevance signals present in the collective feedback from every prior user interaction with the system. Come learn how to move beyond manual relevancy tuning and toward a closed-loop system leveraging both the embedded meaning within your content and the wisdom of the crowds to automatically generate search relevancy algorithms optimized for your domain.
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
Â
Search engines frequently miss the mark when it comes to understanding user intent. This talk will walk through some of the key building blocks necessary to turn a search engine into a dynamically-learning "intent engine", able to interpret and search on meaning, not just keywords. We will walk through CareerBuilder's semantic search architecture, including semantic autocomplete, query and document interpretation, probabilistic query parsing, automatic taxonomy discovery, keyword disambiguation, and personalization based upon user context/behavior. We will also see how to leverage an inverted index (Lucene/Solr) as a knowledge graph that can be used as a dynamic ontology to extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships.
As an example, most search engines completely miss the mark at parsing a query like (Senior Java Developer Portland, OR Hadoop). We will show how to dynamically understand that "senior" designates an experience level, that "java developer" is a job title related to "software engineering", that "portland, or" is a city with a specific geographical boundary (as opposed to a keyword followed by a boolean operator), and that "hadoop" is the skill "Apache Hadoop", which is also related to other terms like "hbase", "hive", and "map/reduce". We will discuss how to train the search engine to parse the query into this intended understanding and how to reflect this understanding to the end user to provide an insightful, augmented search experience.
Topics: Semantic Search, Apache Solr, Finite State Transducers, Probabilistic Query Parsing, Bayes Theorem, Augmented Search, Recommendations, Query Disambiguation, NLP, Knowledge Graphs
Semantic & Multilingual Strategies in Lucene/SolrTrey Grainger
Â
When searching on text, choosing the right CharFilters, Tokenizer, stemmers, and other TokenFilters for each supported language is critical. Additional tools of the trade include language detection through UpdateRequestProcessors, parts of speech analysis, entity extraction, stopword and synonym lists, relevancy differentiation for exact vs. stemmed vs. conceptual matches, and identification of statistically interesting phrases per language. For multilingual search, you also need to choose between several strategies such as: searching across multiple fields, using a separate collection per language combination, or combining multiple languages in a single field (custom code is required for this and will be open sourced). These all have their own strengths and weaknesses depending upon your use case. This talk will provide a tutorial (with code examples) on how to pull off each of these strategies as well as compare and contrast the different kinds of stemmers, review the precision/recall impact of stemming vs. lemmatization, and describe some techniques for extracting meaningful relationships between terms to power a semantic search experience per-language. Come learn how to build an excellent semantic and multilingual search system using the best tools and techniques Lucene/Solr has to offer!
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
Â
Talk Abstract: Most work in semantic search has thus far focused upon either manually building language-specific taxonomies/ontologies or upon automatic techniques such as clustering or dimensionality reduction to discover latent semantic links within the content that is being searched. The former is very labor intensive and is hard to maintain, while the latter is prone to noise and may be hard for a human to understand or to interact with directly. We believe that the links between similar userâs queries represent a largely untapped source for discovering latent semantic relationships between search terms. The proposed system is capable of mining user search logs to discover semantic relationships between key phrases in a manner that is language agnostic, human understandable, and virtually noise-free.
Enhancing relevancy through personalization & semantic searchTrey Grainger
Â
Matching keywords is just step one in the effort to maximize the relevancy of your search platform. In this talk, you'll learn how to implement advanced relevancy techniques which enable your search platform to "learn" from your content and users' behavior. Topics will include automatic synonym discovery, latent semantic indexing, payload scoring, document-to-document searching, foreground vs. background corpus analysis for interesting term extraction, collaborative filtering, and mining user behavior to drive geographically and conceptually personalized search results. You'll learn how CareerBuilder has enhanced Solr (also utilizing Hadoop) to dynamically discover relationships between data and behavior, and how you can implement similar techniques to greatly enhance the relevancy of your search platform.
Building a real time big data analytics platform with solrTrey Grainger
Â
Having âbig dataâ is great, but turning that data into actionable intelligence is where the real value lies. This talk will demonstrate how you can use Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.
At CareerBuilder, we utilize these techniques to report the supply and demand of the labor force, compensation trends, customer performance metrics, and many live internal platform analytics. You will walk away from this talk with an advanced understanding of faceting, including pivot-faceting, geo/radius faceting, time-series faceting, function faceting, and multi-select faceting. Youâll also get a sneak peak at some new faceting capabilities just wrapping up development including distributed pivot facets and percentile/stats faceting, which will be open-sourced.
The presentation will be a technical tutorial, along with real-world use-cases and data visualizations. After this talk, you'll never see Solr as just a text search engine again.
Building a real time, solr-powered recommendation engineTrey Grainger
Â
Searching text is what Solr is known for, but did you know that many companies receive an equal or greater business impact through implementing a recommendation engine in addition to their text search capabilities? With a few tweaks, Solr (or Lucene) can also serve as a full featured recommendation engine. Machine learning libraries like Apache Mahout provide excellent behavior-based, off-line recommendation algorithms, but what if you want more control? This talk will demonstrate how to effectively utilize Solr to perform collaborative filtering (users who liked this also likedâŚ), categorical classification and subsequent hierarchical-based recommendations, as well as related-concept extraction and concept based recommendations. Sound difficult? Itâs not. Come learn step-by-step how to create a powerful real-time recommendation engine using Apache Solr and see some real-world examples of some of these strategies in action.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Â
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Dev Dives: Train smarter, not harder â active learning and UiPath LLMs for do...UiPathCommunity
Â
đĽ Speed, accuracy, and scaling â discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Miningâ˘:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing â with little to no training required
Get an exclusive demo of the new family of UiPath LLMs â GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
đ¨âđŤ Andras Palfi, Senior Product Manager, UiPath
đŠâđŤ Lenka Dulovicova, Product Program Manager, UiPath
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Â
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
Â
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
Â
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Â
Clients donât know what they donât know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clientsâ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Â
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Â
Are you looking to streamline your workflows and boost your projectsâ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, youâre in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part âEssentials of Automationâ series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Hereâs what youâll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
Weâll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Donât miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
Â
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
2. Trey Grainger
Chief Algorithms Officer
⢠Previously: SVP of Engineering @ Lucidworks; Director of Engineering @ CareerBuilder
⢠Georgia Tech â MBA, Management of Technology
⢠Furman University â BA, Computer Science, Business, & Philosophy
⢠Stanford University â Information Retrieval & Web Search
Other fun projects:
⢠Co-author of Solr in Action, plus numerous research publications
⢠Advisor to Presearch, the decentralized search engine
⢠Lucene / Solr contributor
About Me
4. ⢠About Lucidworks
⢠What is AI-powered Search?
⢠What are Thought Vectors?
⢠Vector Search in Apache Solr
⢠What is a Knowledge Graph (and related terminology)?
⢠Philosophy of Language
⢠Semantic Knowledge Graphs
⢠Implementing Knowledge Graph-based Search
⢠Solr Text Tagger
⢠Solrâs Semantic Knowledge Graph
⢠Demos!
Agenda
5. The Search & AI Conference
COMPANY BEHIND
Who are we?
300+ CUSTOMERS ACROSS THE
FORTUNE 1000
400+EMPLOYEES
OFFICES IN
San Francisco, CA (HQ)
Raleigh-Durham, NC
Cambridge, UK
Bangalore, India
Hong Kong
Employ about
40% of the active
committers on
the Solr project
40%
Contribute over
70% of Solr's open
source codebase
70%
DEVELOP & SUPPORT
Apache
7. Proudly built with open-source
tech at its core: Apache Solr &
Apache Spark
Personalizes search
with applied
machine learning
Proven on the
worldâs biggest
information systems
15. Term Documents
a doc1 [2x]
brown doc3 [1x] , doc5 [1x]
cat doc4 [1x]
cow doc2 [1x] , doc5 [1x]
⌠...
once doc1 [1x], doc5 [1x]
over doc2 [1x], doc3 [1x]
the doc2 [2x], doc3 [2x],
doc4[2x], doc5 [1x]
⌠âŚ
Document Content Field
doc1 once upon a time, in a land far,
far away
doc2 the cow jumped over the moon.
doc3 the quick brown fox jumped
over the lazy dog.
doc4 the cat in the hat
doc5 The brown cow said âmooâ
once.
⌠âŚ
What you SEND to Lucene/Solr:
How the content is INDEXED into
Lucene/Solr (conceptually):
An inverted index (âhow a search engine worksâ)
17. BM25 (Relevance Scoring between Query and Documents)
Score(q, d) =
â idf(t) ¡ ( tf(t in d) ¡ (k + 1) ) / ( tf(t in d) + k ¡ (1 â b + b ¡ |d| / avgdl )
t in q
Where:
t = term; d = document; q = query; i = index
tf(t in d) = numTermOccurrencesInDocument ½
idf(t) = 1 + log (numDocs / (docFreq + 1))
|d| = â 1
t in d
avgdl = = ( â |d| ) / ( â 1 ) )
d in i d in i
k = Free parameter. Usually ~1.2 to 2.0. Increases term frequency saturation point.
b = Free parameter. Usually ~0.75. Increases impact of document normalization.
29. Vector Similarity Scores:
Performance Considerations
Problem: Vector Scoring is Slow
⢠Unlike keyword search, which looks up pre-indexed answers to queries, Vector Search must instead calculate
similarities between the query vector and every documentâs vectors to determine best matches, which is
slow at scale.
Solution: Quantized Vectors
⢠âQuantizationâ is the process for mapping vectors features to discrete values.
⢠Creating âtokensâ which map to a similar vector space, enables matching on those tokens to perform an ANN
(Approximate Nearest Neighbor) search
⢠This enables converting vector scoring into a search problem (term lookup and scoring), which is fast again,
at the expense of some recall and scoring accuracy
Recommended Approach: Quantized Vector Search + Vector Similarity Reranking
⢠Combine the best of both worlds by running an initial ANN search on a quantized vector representation, and
then re-rank the top-N results using full Vector similarity scoring.
46. ⢠Take queries, documents, sentences, paragraphs, etc. and
transform them into vectors.
⢠Usually leverage deep learning, which can discover rich language
usage rules and map them to combinations of features in the
vector
⢠Popular Libraries:
⢠Bert
⢠Elmo
⢠Universal Sentence Encoder
⢠Word2Ve
⢠Sentence2Vec
⢠Glove
⢠fastText
Vector Encoders
47. Query Type Likely Outcome
Obscure keyword combinations
Q. (software OR hardware) AND enginee*
⢠Keyword search succeeds
⢠Vector Search fails
Natural Language Queries
Q. Can my wife drive on my insurance?
⢠Keyword search might get
lucky, but probably fails
⢠Vector Search succeeds
Fuzzy Language Queries
Q. famous french tower
⢠Keyword search mismatch
yields poor results
⢠Vector Search succeeds
Structured Relationship Queries
Q. popular barbeque near Activate
⢠Keyword search fails
⢠Vector search fails
⢠Need a Knowledge Graph!
Keyword Search vs. Vector Search
48. What is a Knowledge Graph?
(vs. Ontology vs. Taxonomy vs. Synonyms, etc.)
49.
50. Overly Simplistic Definitions
Alternative Labels: Substitute words with identical meanings
[ CTO => Chief Technology Officer; specialise => specialize ]
Synonyms List: Provides substitute words that can be used to represent
the same or very similar things
[ human => homo sapien, mankind; food => sustenance, meal ]
Taxonomy: Classifies things into Categories
[ john is Human; Human is Mammal; Mammal is Animal ]
Ontology: Defines relationships between types of things
[ animal eats food; human is animal ]
Knowledge Graph: Instantiation of an
Ontology (contains the things that are related)
[ john is human; john eats food ]
In practice, there is significant overlapâŚ
51.
52. What sort of Knowledge Graph can
help us with the kinds of problems we
encounter in Search use cases?
55. But⌠unstructured data is really
more like âhyper-structuredâ
data. It is a graph that contains
much more structure than typical
âstructured data.â
56. Structured Data
Employees Table
id name company start_date
lw100 Trey
Grainger
1234 2016-02-01
dis2 Mickey
Mouse
9123 1928-11-28
tsla1 Elon Musk 5678 2003-07-01
Companies Table
id name start_date
1234 Lucidworks 2016-02-01
5678 Tesla 1928-11-28
9123 Disney 2003-07-01
Discrete
Values
Continuous
Values
Foreign
Key
57. Unstructured Data
Trey Grainger works at Lucidworks.
He is speaking at the Activate 2019 conference.
#Activate19 (Activate) is being held in
Washington, DC September 9-12, 2019. Trey got
his masters from Georgia Tech.
58. Trey Grainger works for Lucidworks.
He is speaking at the Activate 2019
conference.
#Activate19
(Activate) is being held in Washington, DC
September 9-12, 2019.
Trey got his masters degree from
Georgia Tech.
Treyâs Voicemail
Unstructured Data
59. Trey Grainger works for Lucidworks.
He is speaking at the Activate 2019
conference.
#Activate19
(Activate) is being held in Washington, DC
September 9-12, 2019.
Trey got his masters degree from
Georgia Tech.
Treyâs Voicemail
Foreign Key?
60. Trey Grainger works for Lucidworks.
He is speaking at the Activate 2019
conference.
#Activate19
(Activate) is being held in Washington, DC
September 9-12, 2019.
Trey got his masters degree from
Georgia Tech.
Treyâs Voicemail
Fuzzy Foreign Key? (Entity Resolution)
61. Trey Grainger works for Lucidworks.
He is speaking at the Activate 2019
conference.
#Activate19
(Activate) is being held in Washington, DC
September 9-12, 2019.
Trey got his masters degree from
Georgia Tech.
Treyâs Voicemail
Fuzzier Foreign Key? (metadata, latent features)
62. Trey Grainger works for Lucidworks.
He is speaking at the Activate 2019
conference.
#Activate19
(Activate) is being held in Washington, DC
September 9-12, 2019.
Trey got his masters degree from
Georgia Tech.
Treyâs Voicemail
Fuzzier Foreign Key? (metadata, latent features)
Not so fast!
63.
64.
65. Giant Graph of Relationships...
Trey Grainger works for Lucidworks.
He is speaking at the Activate 2019
conference.
#Activate19
(Activate) is being held in Washington, DC
September 9-12, 2019.
Trey got his masters degree from
Georgia Tech.
Treyâs Voicemail
66. How do we easily harness this
âsemantic graphâ of relationships
within unstructured information?
68. id: 1
job_title: Software Engineer
desc: software engineer at a
great company
skills: .Net, C#, java
id: 2
job_title: Registered Nurse
desc: a registered nurse at
hospital doing hard work
skills: oncology, phlebotemy
id: 3
job_title: Java Developer
desc: a software engineer or a
java engineer doing work
skills: java, scala, hibernate
field doc term
desc
1
a
at
company
engineer
great
software
2
a
at
doing
hard
hospital
nurse
registered
work
3
a
doing
engineer
java
or
software
work
job_title 1
Software
Engineer
⌠⌠âŚ
Terms-Docs Inverted IndexDocs-Terms Forward IndexDocuments
Source: Trey Grainger,
Khalifeh AlJadda,
Mohammed Korayem,
Andries Smith.âThe Semantic
Knowledge Graph: A
compact, auto-generated
model for real-time traversal
and ranking of any
relationship within a domainâ.
DSAA 2016.
Knowledge
Graph
field term postings
list
doc pos
desc
a
1 4
2 1
3 1, 5
at
1 3
2 4
company 1 6
doing
2 6
3 8
engineer
1 2
3 3, 7
great 1 5
hard 2 7
hospital 2 5
java 3 6
nurse 2 3
or 3 4
registered 2 2
software
1 1
3 2
work
2 10
3 9
job_title java developer 3 1
⌠⌠⌠âŚ
69. Source: Trey Grainger,
Khalifeh AlJadda,
Mohammed Korayem,
Andries Smith.âThe
Semantic Knowledge
Graph: A compact, auto-
generated model for real-
time traversal and ranking of
any relationship within a
domainâ. DSAA 2016.
Knowledge
Graph
Set-theory View
Graph View
How the Graph Traversal Works
skill:
Java
skill:
Scala
skill:
Hibernate
skill:
Oncology
has_related_skill
has_related_skill
has_related_skill
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
skill:
Java
skill: Java
skill:
Scala
skill:
Hibernate
skill:
Oncology
Data Structure View
Java
Scala Hibernate
docs
1, 2, 6
docs
3, 4
Oncology
doc 5
70. DOI: 10.1109/DSAA.2016.51
Conference: 2016 IEEE International Conference on
Data Science and Advanced Analytics (DSAA)
Source: Trey Grainger,
Khalifeh AlJadda, Mohammed
Korayem, Andries Smith.âThe
Semantic Knowledge Graph: A
compact, auto-generated
model for real-time traversal
and ranking of any relationship
within a domainâ. DSAA 2016.
Knowledge
Graph
Graph Traversal
Data Structure View
Graph View
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
skill:
Java
skill: Java
skill: Scala
skill:
Hibernate
skill:
Oncology
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
job_title:
Software
Engineer
job_title:
Data
Scientist
job_title:
Java
Developer
âŚâŚ
Inverted Index
Lookup
Forward Index
Lookup
Forward Index
Lookup
Inverted Index
Lookup
Java
Java
Developer
Hibernate
Scala
Software
Engineer
Data
Scientist
has_related_skill has_related_skill
has_related_skill
has_related_job_title
has_related_job_title
has_related_job_title
has_related_job_title
has_related_job_title
has_related_job_title
71. Scoring of Node Relationships (Edge Weights)
Foreground vs. Background Analysis
Every term scored against itâs context. The more
commonly the term appears within itâs foreground
context versus its background context, the more
relevant it is to the specified foreground context.
countFG(x) - totalDocsFG * probBG(x)
z = --------------------------------------------------------
sqrt(totalDocsFG * probBG(x) * (1 - probBG(x)))
{ "type":"keywordsâ, "values":[
{ "value":"hive", "relatedness":0.9773, "popularity":369 },
{ "value":"java", "relatedness":0.9236, "popularity":15653 },
{ "value":".net", "relatedness":0.5294, "popularity":17683 },
{ "value":"bee", "relatedness":0.0, "popularity":0 },
{ "value":"teacher", "relatedness":-0.2380, "popularity":9923 },
{ "value":"registered nurse", "relatedness": -0.3802 "popularity":27089 } ] }
We are essentially boosting terms which are more related to some known feature
(and ignoring terms which are equally likely to appear in the background corpus)
+
-
Foreground Query:
"Hadoop"
Knowledge
Graph
72. Related term vector (for query concept expansion)
http://localhost:8983/solr/stack-exchange-health/skg
75. Differentiating related terms
Misspellings: managr => manager
Synonyms: cpa => certified public accountant
rn => registered nurse
r.n. => registered nurse
Ambiguous Terms*: driver => driver (trucking) ~80% likelihood
driver => driver (software) ~20% likelihood
Related Terms: r.n. => nursing, bsn
hadoop => mapreduce, hive, pig
*differentiated based upon user and query context
76. Thought Exercise
What do you think of when I say the
word âdriverâ?
What about âarchitectâ?
77. Use Case: Query Disambiguation
Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
Example Related Keywords (representing multiple meanings)
driver truck driver, linux, windows, courier, embedded, cdl,
delivery
architect autocad drafter, designer, enterprise architect, java
architect, designer, architectural designer, data architect,
oracle, java, architectural drafter, autocad, drafter, cad,
engineer
⌠âŚ
78. Use Case: Query Disambiguation
Example Related Keywords (representing multiple meanings)
driver truck driver, linux, windows, courier, embedded, cdl,
delivery
architect autocad drafter, designer, enterprise architect, java
architect, designer, architectural designer, data architect,
oracle, java, architectural drafter, autocad, drafter, cad,
engineer
⌠âŚ
Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
80. Semantic Knowledge Graph: Discovering ambiguous phrases
1) Exact same concept, but use
a document classification
field (i.e. category) as the first
level of your graph, and the
related terms as the second
level to which you traverse.
2) Has the benefit that you donât need query logs to mine, but it will be representative
of your data, as opposed to your userâs intent, so the quality depends on how clean
and representative your documents are.
Additional Benefit: Multi-dimensional disambiguation and dynamic materialization of
categories. Effectively an dynamically-materialized probabilistic graphical model
81. Disambiguation by Category Example
Meaning 1: Restaurant => bbq, brisket, ribs, pork, âŚ
Meaning 2: Outdoor Equipment => bbq, grill, charcoal, propane, âŚ
82. Disambiguated meanings (represented as term vectors)
Example Related Keywords (Disambiguated Meanings)
architect 1: enterprise architect, java architect, data architect, oracle, java, .net
2: architectural designer, architectural drafter, autocad, autocad drafter, designer,
drafter, cad, engineer
driver 1: linux, windows, embedded
2: truck driver, cdl driver, delivery driver, class b driver, cdl, courier
designer 1: design, print, animation, artist, illustrator, creative, graphic artist, graphic,
photoshop, video
2: graphic, web designer, design, web design, graphic design, graphic designer
3: design, drafter, cad designer, draftsman, autocad, mechanical designer, proe,
structural designer, revit
⌠âŚ
Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
83. Every term or phrase is a
Context-dependent cluster of
meaning with an ambiguous label
85. What does âloveâ mean in the context of âhugâ?
http://localhost:8983/solr/thesaurus/skg
"embrace"
86. What does âloveâ mean in the context of âchildâ?
http://localhost:8983/solr/thesaurus/skg
87. So whatâs the end goal here?
Userâs Query:
machine learning research and development Portland, OR software
engineer AND hadoop, java
Traditional Query Parsing:
(machine AND learning AND research AND development AND portland)
OR (software AND engineer AND hadoop AND java)
Semantic Query Parsing:
"machine learning" AND "research and development" AND "Portland, OR"
AND "software engineer" AND hadoop AND java
Semantically Expanded Query:
"machine learning"^10 OR "data scientist" OR "data mining" OR "artificial intelligence")
AND ("research and development"^10 OR "r&d") AND
AND ("Portland, OR"^10 OR "Portland, Oregon" OR {!geofilt pt=45.512,-122.676 d=50 sfield=geo})
AND ("software engineer"^10 OR "software developer")
AND (hadoop^10 OR "big data" OR hbase OR hive) AND (java^10 OR j2ee)
96. Demo Data
Places (also includes geonames database)
Entities (includes search commands)
Text Content
[ Web crawl of restaurant and product reviews sites ]
102. popular barbeque near Activate
(popular same as "good", "top", "best")
Hotels near Activate
hotels near popular BBQ in Boston
BBQ near airports near Activate
hotels near movie theaters in Boston âŚ
And thatâs really just the beginning!