The document discusses word embedding techniques, specifically Word2vec. It introduces the motivation for distributed word representations and describes the Skip-gram and CBOW architectures. Word2vec produces word vectors that encode linguistic regularities, with simple examples showing words with similar relationships have similar vector offsets. Evaluation shows Word2vec outperforms previous methods, and its word vectors are now widely used in NLP applications.
Google Lighthouse is super valuable but it only checks one page at a time.
Hamlet will show you how to get it to check all pages of a site, and how to run automated Lighthouse checks on-demand at scheduled intervals and from automated tests.
He'll also cover how to set performance budgets, how to get alerts when budgets are exceeded, and how to aggregate page reports using BigQuery and Google Data Studio.
Croud Presents: How to Build a Data-driven SEO Strategy Using NLPDaniel Liddle
Exploring how you can harness the huge amounts of data available to build an effective, empirically-led SEO strategy using machine learning resource such as natural language processing (NLP). Including useful and practical tips on areas such as topic modelling, categorisation and clustering, so you can get started on using NLP in your own SEO strategy right away.
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...Dawn Anderson MSc DigM
Structured data accounts for only a small part of the web and the problem grows as the volume of the online content grows. Schema markup is a drop in the ocean to help with this. However, things are being addressed in the natural language research space in the form of dense retrieval and other developments such as Sentence BERT and FAISS. Utilising heuristics such as umbrellas and sidecar pages will help to send clues and assist with ensuring search engines rank the right pages from your sites for SEO
Using command line to save time on common SEO tasks DinoKukic
The presentation for my BrightonSEO talk in October 2022. It covers the 'why' of command line. The basic crash course and also demonstrates the CLI tool for SEO.
Lexical Semantics, Semantic Similarity and Relevance for SEOKoray Tugberk GUBUR
Lexical semantics and relations between words include relations of superiority, inferiority, part, whole, opposition, and sameness between the meanings of words. The same word can be a meronymy, hyponym, or antonym of another word, depending on the word before or after it. The lexical relation value of the first word can affect the structure of the next word, affecting the context of the sentence and the Information Retrieval Score. Information Retrieval Score is the score that determines how much content is related to a query, how close the different variants of the related query are, and the structure processed by the search engine’s query processor to the relevant document. A higher information retrieval score represents better relevance and possible click satisfaction.
The problem with a semi-structured and distracting context for Information Retrieval Score is that, if a document is not configured for a single topic, the IR Score can be diluted by the two different contexts resulting in a relative rank lost to another textual document.
IR Score Dilution involves badly structured lexical relations, along with bad word proximity. The relevant words that complete each other within the meaning map should be used closely, within a paragraph or section of the document, to signal the context in a more clear way to increase the IR Score. A search engine can check whether the document contains the hyponym of the words within the query or not. A possible query prediction can be generated from the hypernyms of the query. A search engine can check only the anchor texts to see whether there is a word within the “hyponym distance” which represents the hyponym depth between two different words.
Lexical Relations can represent the semantic annotations for a document. A semantic annotation is a word that describes the document overall in terms of category and main context that carries the purpose of the document. A semantic annotation can contain the main entity of the document or a general concept for covering a broader meaning area (knowledge domain). Semantic Annotations can be generated with the lexical relations between words. A semantic annotation can be used to match the document to the query. Semantic annotations are factors for a better IR Score.
A search engine can generate phrase patterns from the lexical relationships between words within the queries or the documents. A phrase pattern contains sections that define a concept with qualifiers. Phrase patterns can contain a hyponym just after an adjective, or a hypernym with the antonym of the same adjective. Most of these connections and patterns are used within the Recurrent Neural Network (RNN) for the next word prediction. A phrase pattern helps a search engine to increase its confidence score for relating the document to the specific query, or the meaning of the query.
The Python Cheat Sheet for the Busy MarketerHamlet Batista
What percentage of an Inbound marketer's day doesn't involve working with spreadsheets? How much of this work is time-consuming and repetitive? In this interactive session, you will learn how to manipulate Google Sheets to automate common data analysis workflows using Python, a very easy to use programming language.
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEOKoray Tugberk GUBUR
Query Processing is the process of query term weight calculation, query augmentation, query context defining, and more. Query understanding and Query clustering are related to Information Retrieval tasks for the search engines. To provide a better search engine optimization effort and project result, the organic search performance optimizers need to implement query processing methodologies. Digital marketing and SEO are connected to each other. Understanding a query includes query parsing, query rewriting, question generation, and answer pairing. Multi-stages Query Processing, Candidate Answer Passages, or Candidate Answer Passages and Answer Term Weighting are some of the concepts from the Google Search Engine to parse the queries.
The presentation of The Secret Life of Queries, Parsing, Rewriting & SEO has been presented at the Brighton SEO Event in April 2022. The event speech focused on explaining the theoretical SEO and practical SEO examples together.
Query Processing methodologies are beyond synonym matching or synonym finding. It involves multiple aspects of the words, and meanings of the words. The theme of words, the centrality of words, attention windows, context windows, and word co-occurrence matrices, GloVe, Word2Vec, word embeddings, character embeddings, and more.
Themes of words contain the word probability like in Continues Bag of Window.
The search engine optimization community focuses on keyword research by matching the queries. Query processing involves query word order change, query word type change, query word combination change, query phrase synonym usage, query question generation, query clustering. Query processing and document processing are correlational. Query processing is to understand a query while document processing is to process a web document. Both of the processes are for ranking algorithms. Providing a better ranking algorithm requires a better query understanding. And providing better rankings as SEOs require better search engine understanding. Thus, understanding the methods of query processing is necessary.
Search Query Processing is implementing the query processing for thesearch engines. Search query refers to the phrase that search engine users use for searching. Search intent understanding and search intent grouping are two different things. But, query templates, questions templates, and document templates work together. Search query is for organic search behaviors. A web search engine answers millions of queries every day. Search query processing is a fundamental task for search engine optimization and search engine result page optimization.
The "Semantic Search Engine: Query Processing" slides from Koray Tuğberk GÜBÜR supported the presentation of "Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO". The presentation has been created by Dear Rebecca Berbel.
Many thanks to the Google engineers that created the Semantic Search Engine patents including Larry Page.
Google Lighthouse is super valuable but it only checks one page at a time.
Hamlet will show you how to get it to check all pages of a site, and how to run automated Lighthouse checks on-demand at scheduled intervals and from automated tests.
He'll also cover how to set performance budgets, how to get alerts when budgets are exceeded, and how to aggregate page reports using BigQuery and Google Data Studio.
Croud Presents: How to Build a Data-driven SEO Strategy Using NLPDaniel Liddle
Exploring how you can harness the huge amounts of data available to build an effective, empirically-led SEO strategy using machine learning resource such as natural language processing (NLP). Including useful and practical tips on areas such as topic modelling, categorisation and clustering, so you can get started on using NLP in your own SEO strategy right away.
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...Dawn Anderson MSc DigM
Structured data accounts for only a small part of the web and the problem grows as the volume of the online content grows. Schema markup is a drop in the ocean to help with this. However, things are being addressed in the natural language research space in the form of dense retrieval and other developments such as Sentence BERT and FAISS. Utilising heuristics such as umbrellas and sidecar pages will help to send clues and assist with ensuring search engines rank the right pages from your sites for SEO
Using command line to save time on common SEO tasks DinoKukic
The presentation for my BrightonSEO talk in October 2022. It covers the 'why' of command line. The basic crash course and also demonstrates the CLI tool for SEO.
Lexical Semantics, Semantic Similarity and Relevance for SEOKoray Tugberk GUBUR
Lexical semantics and relations between words include relations of superiority, inferiority, part, whole, opposition, and sameness between the meanings of words. The same word can be a meronymy, hyponym, or antonym of another word, depending on the word before or after it. The lexical relation value of the first word can affect the structure of the next word, affecting the context of the sentence and the Information Retrieval Score. Information Retrieval Score is the score that determines how much content is related to a query, how close the different variants of the related query are, and the structure processed by the search engine’s query processor to the relevant document. A higher information retrieval score represents better relevance and possible click satisfaction.
The problem with a semi-structured and distracting context for Information Retrieval Score is that, if a document is not configured for a single topic, the IR Score can be diluted by the two different contexts resulting in a relative rank lost to another textual document.
IR Score Dilution involves badly structured lexical relations, along with bad word proximity. The relevant words that complete each other within the meaning map should be used closely, within a paragraph or section of the document, to signal the context in a more clear way to increase the IR Score. A search engine can check whether the document contains the hyponym of the words within the query or not. A possible query prediction can be generated from the hypernyms of the query. A search engine can check only the anchor texts to see whether there is a word within the “hyponym distance” which represents the hyponym depth between two different words.
Lexical Relations can represent the semantic annotations for a document. A semantic annotation is a word that describes the document overall in terms of category and main context that carries the purpose of the document. A semantic annotation can contain the main entity of the document or a general concept for covering a broader meaning area (knowledge domain). Semantic Annotations can be generated with the lexical relations between words. A semantic annotation can be used to match the document to the query. Semantic annotations are factors for a better IR Score.
A search engine can generate phrase patterns from the lexical relationships between words within the queries or the documents. A phrase pattern contains sections that define a concept with qualifiers. Phrase patterns can contain a hyponym just after an adjective, or a hypernym with the antonym of the same adjective. Most of these connections and patterns are used within the Recurrent Neural Network (RNN) for the next word prediction. A phrase pattern helps a search engine to increase its confidence score for relating the document to the specific query, or the meaning of the query.
The Python Cheat Sheet for the Busy MarketerHamlet Batista
What percentage of an Inbound marketer's day doesn't involve working with spreadsheets? How much of this work is time-consuming and repetitive? In this interactive session, you will learn how to manipulate Google Sheets to automate common data analysis workflows using Python, a very easy to use programming language.
Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEOKoray Tugberk GUBUR
Query Processing is the process of query term weight calculation, query augmentation, query context defining, and more. Query understanding and Query clustering are related to Information Retrieval tasks for the search engines. To provide a better search engine optimization effort and project result, the organic search performance optimizers need to implement query processing methodologies. Digital marketing and SEO are connected to each other. Understanding a query includes query parsing, query rewriting, question generation, and answer pairing. Multi-stages Query Processing, Candidate Answer Passages, or Candidate Answer Passages and Answer Term Weighting are some of the concepts from the Google Search Engine to parse the queries.
The presentation of The Secret Life of Queries, Parsing, Rewriting & SEO has been presented at the Brighton SEO Event in April 2022. The event speech focused on explaining the theoretical SEO and practical SEO examples together.
Query Processing methodologies are beyond synonym matching or synonym finding. It involves multiple aspects of the words, and meanings of the words. The theme of words, the centrality of words, attention windows, context windows, and word co-occurrence matrices, GloVe, Word2Vec, word embeddings, character embeddings, and more.
Themes of words contain the word probability like in Continues Bag of Window.
The search engine optimization community focuses on keyword research by matching the queries. Query processing involves query word order change, query word type change, query word combination change, query phrase synonym usage, query question generation, query clustering. Query processing and document processing are correlational. Query processing is to understand a query while document processing is to process a web document. Both of the processes are for ranking algorithms. Providing a better ranking algorithm requires a better query understanding. And providing better rankings as SEOs require better search engine understanding. Thus, understanding the methods of query processing is necessary.
Search Query Processing is implementing the query processing for thesearch engines. Search query refers to the phrase that search engine users use for searching. Search intent understanding and search intent grouping are two different things. But, query templates, questions templates, and document templates work together. Search query is for organic search behaviors. A web search engine answers millions of queries every day. Search query processing is a fundamental task for search engine optimization and search engine result page optimization.
The "Semantic Search Engine: Query Processing" slides from Koray Tuğberk GÜBÜR supported the presentation of "Search Query Processing: The Secret Life of Queries, Parsing, Rewriting & SEO". The presentation has been created by Dear Rebecca Berbel.
Many thanks to the Google engineers that created the Semantic Search Engine patents including Larry Page.
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Koray Tugberk GUBUR
Semantic Search Engines can understand human language to analyze the need behind a query. Instead of focusing, string, or word matching, a semantic search engine focuses on concepts, intents, and relations of named entities. Taxonomy, ontology, onomastics, semantic role labeling, relation detection, lexical semantics, entity extraction, recognition, resolution can be used by semantic search engines. In this PDF file, semantic search engines' evolution will be processed based on Google Search Engine's research papers, patents, and official announcements. From 1998 to 20021, search's and search engines' evolution, from strings to things, from phrases to entities will be told along with query processing, and parsing methodology changes.
As opposed to lexical search, semantic searching searches for meaning, not meaningless matches of the query words. Semantic search attempts to increase the relevancy of results by understanding searchers' intents and the context of terms in the searchable dataspace, whether online or within a closed system. The right semantic search content is a blend of natural language, focuses on the intent of the user, and considers other topics the user may be interested in.
Ontologies, XML, and other structured data sources can be used to retrieve knowledge using semantic search according to some authors. The use of such technologies provides a mechanism for creating formal expressions of domain knowledge that are highly expressive and may allow the user to express more detailed intent during query processing.
brightonSEO - Stress Is Contagious Don't Catch It From Your ClientsKathryn Monkcom
Agencies are made up of people. Client teams are made up of people. And people have A LOT of emotions. Juggling other people’s fears, needs and stresses is exhausting and yet, if you care about your work, you do it every day.
You’ll leave this talk with psychologically-backed coaching techniques you can use in your professional interactions to stop other people’s drama clouding your judgement and affecting your mood at work and at home.
A pipeline of reading, parsing, optimizing, and storing a log file to parquet.
This script uses the Python pandas library, utilizing the efficient Apache Parquet format for a big speed up and efficient storage.
Would you like to find high intent keywords using methods that are proven to be effective?
How would you like to establish a simple yet powerful keyword strategy that will help your target audience find you more quickly?
Without a solid keyword strategy and deep understanding of your audience, your keywords are simply words on a webpage. We'll show you the finer points of keyword research that can help increase your organic visibility and give your customers exactly what they need, right when they are searching for it.
Watch this webinar and discover how to take your keyword research to the next level.
In this webinar, you’ll learn:
-All about keyword relevancy and intent.
-Keyword research best practices.
-How to identify profitable keywords.
Having a clear keyword strategy is often the biggest obstacle to online success for enterprises and businesses.
Zack Kadish, Sr. SEO Success Manager at Conductor, will deliver an eye-opening crash course on keyword research techniques, tips, and tactics!
[BrightonSEO 2022] Unlocking the Hidden Potential of Product Listing PagesAreej AbuAli
E-commerce websites’ product listing pages contain untapped hidden potential. This talk is all about unlocking the magic of your listing pages by making the most out of filters and internal linking. Instead of being fixated on those landing page head terms, let’s turn our attention to the indexability of long-tail pages with high conversion. Whether you work in e-commerce or not, we’ll also cover how to embed yourself within Tech teams and analyse impactful changes.
In the last few years, Artificial Intelligence applications have become more and more sophisticated and often operate like algorithmic “black boxes” for decision-making. Due to this fact, some questions naturally arise when working with these models: why should we trust a certain decision taken by these algorithms? Why and how was this prediction made? Which variables mostly influenced the prediction? The most crucial challenge with complex machine learning models is therefore their interpretability and explainability. This talk aims to illustrate an overview of the most popular explainability techniques and their application in Learning to Rank. In particular, we will examine in depth a powerful library called SHAP with both theoretical and practical insights; we will talk about its amazing tools to give an explanation of the model behaviour, especially how each feature impacts the model’s output, and we will explain to you how to interpret the results in a Learning to Rank scenario.
Advanced Ways to Use Ahrefs (That You Didn't Know About)Ahrefs
This is a presentation Si Quan (SQ) did for Digital Marketing Skill Share (DMSS) 2019 in Bali.
In this presentation, he covers advanced ways to use Ahrefs, like:
- how to use Ahrefs to find broken link building opportunities
- how to use Ahrefs to find "Skyscraper Technique" opportunities
- how to use Link Intersect
- how to do keyword research across the different search engines
- how to find keywords of different search intents
- how to find affiliate marketing opportunities
- how to find what forums and community sites are ranking for
- how to find keywords other sites have ranked for fast
- the "Content Explorer" hack
SEO Case Study - Hangikredi.com From 12 March to 24 September Core UpdateKoray Tugberk GUBUR
Start Summary:
"131% Organic Session Increase in 5 Months
62% Impression Increase in 5 Months
144% Clicks Increase in 5 Months"
This SEO Case study is about Google Core Updates and their impacts on biggest financial institution website in Turkey.
I have started to work in Hangikredi.com at 26 March 2019. But, the company's website had been affected by 12 March Google Core Update very negatively.
I had started to work in here while a crisis had been happening.
I had examined the web site and figured it out that the real problems were crawl budget, authority signasl and relevancy-entity connection. I have activated social media, Google My Bussiness accounts, I have entered financial forums, every other alternative channel. I created a news publisher network about us. I cleaned the misleading status codes, HTML and CSS mistakes, optimised meta tags, fixed the redirection chains, I used the image compressions and deleted lots of unnecessary URL and their contents, I created the internal link structure from scratch.
Until 5 June Google Core Update, we were winners again.
We had regained all of our traffic lost. Until 1 August server atack, we were okay, then in one day, everything went wrong.
I had started from 0 again...
I had been optimising web site's offpage signals for regain the trust of Google AI and I had been supporting this strategy with onpage elements.
After 24th September Google Core Update, there was another success. We breaked the crawl load/rate record, avarage site position, CTR and impression, click records for site history.
In this CASE Study, you are gonna find details of a SEO Success Story with graphics and also some funny cencor images from my life.
End Summary:
"12 March, 5 june and 24 September Google Core Updates with 1 August Server Atack are the milestones of this SEO Casse Study. You will find all details from our view of point. I hope you will like it."
Trying to establish a more consistent SEO structure within your organization?
Wish every SEO fire had a more standardized, easy-to-follow solution?
We know – no two days in SEO are the same.
However, it’s surprisingly easy to find a consistent approach that provides meaningful impact.
And – it works whether you're in-house, an agency, or a freelance consultant.
Watch this webinar and learn the 4-step process that will help you tackle SEO challenges head-on as they arise.
This 4-Step SEO Waltz takes you through:
Visibility
Diagnostics
Iteration
Monitoring
Jamie Indigo and Michelle Race from Deepcrawl walks you through a four-step process that helps you meet SEO challenges head-on as they arise and stop SEO fires before they start.
SEO professionals still view the SEO process as a complex dance, but it could be a simple and practical framework for addressing challenges in various forms.
Discover how you can use the steps, pillars, and methods for more effective SEO project management within your company.
Semantic Publishing and Entity SEO - Conteference 20-11-2022Massimiliano Geraci
Semantic Publishing is publishing a page on the Internet by adding a semantic layer (i.e., semantic enrichment) in the form of structured data that describes the page itself.
Opinion-based Article Ranking for Information Retrieval Systems: Factoids and...Koray Tugberk GUBUR
How Search Engines Leverage Opinion-based Articles for Ranking?
Search engines use opinions, and factoids to understand the consensus. News search engines use different reports, and opinions in their search results to satisfy the urgent news information needed by the newsreaders. The news search engines differentiate disinformation from information to protect the newsreaders. Google, Microsoft Bing, Yandex, and DuckDuckGo have different algorithms and prioritization for classifications of the news sources, or prioritization of the news, and newsworthy topics.
Corroboration of the Web Answers from the Open Web is a research paper from Amelia Marian and Minji Wu explaining how a search engine can rank information according to its accuracy.
Google started to explain that the Expertise-Authoriteveness-Trustworthiness is the most important group of signals to be sure that a result won't shame the search engine. Embarrassment factors for the search engines involve wrong information on a news title on the news story, or a wrong featured snippet. A search engine might be shame due to the bad result that is ranking on the SERP.
Dense-retrieval, context scoring, named entity recognition, semantic role labeling, truth ranges, fix points, confidence score, query processing, and parsing.
Context understanding requires processing the text, and tokenizing the words by recognizing the word sense. Processing the text of the news articles requires time. And, most of the time, news search engines do not have enough time for processing the text. Thus, PageRank provides a sustainable timeline for the news sources for rankings.
PageRank is a quick signal for search engines to show the authenticity of the news web source. The highly cited sources are ranked higher, and longer on the top stories. Usually, Google protects the high PageRank sources by trusting the judgment of the websites. But, fact-finding algorithms do not use PageRank mostly, unless they couldn't decide by looking at other factors, or they do not have enough resources to process the text among the hundreds of sources.
News ranking algorithms differentiate opinions, reports, and breaking news from each other. News-related entities, their co-existence, and contextual relations change. Google inventors suggest differentiation of these entities from each other for a proper news categorization.
News categorization is important to match the interested topics of the users in queryless news feeds such as Google Discover. Google Discover is a queryless news feed that serves news stories according to the users' interest areas.
An opinion for news might be misleading. Some news titles might be too harsh, or strict. Search engines use these headlines to differentiate the non-trustworthy news sources from the trustworthy ones. And, opinions of journalists or their different interpretations of the events might change the rankings of a document according to the fact-finding algorithms.
Google Sheets For SEO - Tom Pool - London SEO Meetup XLTom Pool
First delivered at London SEO Meetup XL on May 4th, 2022.
This talk focuses on a number of different tips and tricks that can be used to help improve the overall analysis of data within Google Sheets.
A number of formulae are covered within this talk, including:
REGEXMATCH & REGEXREPLACE
TRANSLATE
COUNTIF
VLOOKUP
INDEX MATCH
IFNA
The talk finally touches on the usage of the =QUERY formula, and how it can aid in a number of different situations. An example of creating a top ten dashboard is provided.
A sample Google Spreadsheet with all formulae discussed can be seen here:
https://docs.google.com/spreadsheets/d/1vMZIh6NWm8gRwIP-SkTP0N6BWEpYmFFge7FBR4Wqqtg/edit#gid=0
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Koray Tugberk GUBUR
Semantic Search Engines can understand human language to analyze the need behind a query. Instead of focusing, string, or word matching, a semantic search engine focuses on concepts, intents, and relations of named entities. Taxonomy, ontology, onomastics, semantic role labeling, relation detection, lexical semantics, entity extraction, recognition, resolution can be used by semantic search engines. In this PDF file, semantic search engines' evolution will be processed based on Google Search Engine's research papers, patents, and official announcements. From 1998 to 20021, search's and search engines' evolution, from strings to things, from phrases to entities will be told along with query processing, and parsing methodology changes.
As opposed to lexical search, semantic searching searches for meaning, not meaningless matches of the query words. Semantic search attempts to increase the relevancy of results by understanding searchers' intents and the context of terms in the searchable dataspace, whether online or within a closed system. The right semantic search content is a blend of natural language, focuses on the intent of the user, and considers other topics the user may be interested in.
Ontologies, XML, and other structured data sources can be used to retrieve knowledge using semantic search according to some authors. The use of such technologies provides a mechanism for creating formal expressions of domain knowledge that are highly expressive and may allow the user to express more detailed intent during query processing.
brightonSEO - Stress Is Contagious Don't Catch It From Your ClientsKathryn Monkcom
Agencies are made up of people. Client teams are made up of people. And people have A LOT of emotions. Juggling other people’s fears, needs and stresses is exhausting and yet, if you care about your work, you do it every day.
You’ll leave this talk with psychologically-backed coaching techniques you can use in your professional interactions to stop other people’s drama clouding your judgement and affecting your mood at work and at home.
A pipeline of reading, parsing, optimizing, and storing a log file to parquet.
This script uses the Python pandas library, utilizing the efficient Apache Parquet format for a big speed up and efficient storage.
Would you like to find high intent keywords using methods that are proven to be effective?
How would you like to establish a simple yet powerful keyword strategy that will help your target audience find you more quickly?
Without a solid keyword strategy and deep understanding of your audience, your keywords are simply words on a webpage. We'll show you the finer points of keyword research that can help increase your organic visibility and give your customers exactly what they need, right when they are searching for it.
Watch this webinar and discover how to take your keyword research to the next level.
In this webinar, you’ll learn:
-All about keyword relevancy and intent.
-Keyword research best practices.
-How to identify profitable keywords.
Having a clear keyword strategy is often the biggest obstacle to online success for enterprises and businesses.
Zack Kadish, Sr. SEO Success Manager at Conductor, will deliver an eye-opening crash course on keyword research techniques, tips, and tactics!
[BrightonSEO 2022] Unlocking the Hidden Potential of Product Listing PagesAreej AbuAli
E-commerce websites’ product listing pages contain untapped hidden potential. This talk is all about unlocking the magic of your listing pages by making the most out of filters and internal linking. Instead of being fixated on those landing page head terms, let’s turn our attention to the indexability of long-tail pages with high conversion. Whether you work in e-commerce or not, we’ll also cover how to embed yourself within Tech teams and analyse impactful changes.
In the last few years, Artificial Intelligence applications have become more and more sophisticated and often operate like algorithmic “black boxes” for decision-making. Due to this fact, some questions naturally arise when working with these models: why should we trust a certain decision taken by these algorithms? Why and how was this prediction made? Which variables mostly influenced the prediction? The most crucial challenge with complex machine learning models is therefore their interpretability and explainability. This talk aims to illustrate an overview of the most popular explainability techniques and their application in Learning to Rank. In particular, we will examine in depth a powerful library called SHAP with both theoretical and practical insights; we will talk about its amazing tools to give an explanation of the model behaviour, especially how each feature impacts the model’s output, and we will explain to you how to interpret the results in a Learning to Rank scenario.
Advanced Ways to Use Ahrefs (That You Didn't Know About)Ahrefs
This is a presentation Si Quan (SQ) did for Digital Marketing Skill Share (DMSS) 2019 in Bali.
In this presentation, he covers advanced ways to use Ahrefs, like:
- how to use Ahrefs to find broken link building opportunities
- how to use Ahrefs to find "Skyscraper Technique" opportunities
- how to use Link Intersect
- how to do keyword research across the different search engines
- how to find keywords of different search intents
- how to find affiliate marketing opportunities
- how to find what forums and community sites are ranking for
- how to find keywords other sites have ranked for fast
- the "Content Explorer" hack
SEO Case Study - Hangikredi.com From 12 March to 24 September Core UpdateKoray Tugberk GUBUR
Start Summary:
"131% Organic Session Increase in 5 Months
62% Impression Increase in 5 Months
144% Clicks Increase in 5 Months"
This SEO Case study is about Google Core Updates and their impacts on biggest financial institution website in Turkey.
I have started to work in Hangikredi.com at 26 March 2019. But, the company's website had been affected by 12 March Google Core Update very negatively.
I had started to work in here while a crisis had been happening.
I had examined the web site and figured it out that the real problems were crawl budget, authority signasl and relevancy-entity connection. I have activated social media, Google My Bussiness accounts, I have entered financial forums, every other alternative channel. I created a news publisher network about us. I cleaned the misleading status codes, HTML and CSS mistakes, optimised meta tags, fixed the redirection chains, I used the image compressions and deleted lots of unnecessary URL and their contents, I created the internal link structure from scratch.
Until 5 June Google Core Update, we were winners again.
We had regained all of our traffic lost. Until 1 August server atack, we were okay, then in one day, everything went wrong.
I had started from 0 again...
I had been optimising web site's offpage signals for regain the trust of Google AI and I had been supporting this strategy with onpage elements.
After 24th September Google Core Update, there was another success. We breaked the crawl load/rate record, avarage site position, CTR and impression, click records for site history.
In this CASE Study, you are gonna find details of a SEO Success Story with graphics and also some funny cencor images from my life.
End Summary:
"12 March, 5 june and 24 September Google Core Updates with 1 August Server Atack are the milestones of this SEO Casse Study. You will find all details from our view of point. I hope you will like it."
Trying to establish a more consistent SEO structure within your organization?
Wish every SEO fire had a more standardized, easy-to-follow solution?
We know – no two days in SEO are the same.
However, it’s surprisingly easy to find a consistent approach that provides meaningful impact.
And – it works whether you're in-house, an agency, or a freelance consultant.
Watch this webinar and learn the 4-step process that will help you tackle SEO challenges head-on as they arise.
This 4-Step SEO Waltz takes you through:
Visibility
Diagnostics
Iteration
Monitoring
Jamie Indigo and Michelle Race from Deepcrawl walks you through a four-step process that helps you meet SEO challenges head-on as they arise and stop SEO fires before they start.
SEO professionals still view the SEO process as a complex dance, but it could be a simple and practical framework for addressing challenges in various forms.
Discover how you can use the steps, pillars, and methods for more effective SEO project management within your company.
Semantic Publishing and Entity SEO - Conteference 20-11-2022Massimiliano Geraci
Semantic Publishing is publishing a page on the Internet by adding a semantic layer (i.e., semantic enrichment) in the form of structured data that describes the page itself.
Opinion-based Article Ranking for Information Retrieval Systems: Factoids and...Koray Tugberk GUBUR
How Search Engines Leverage Opinion-based Articles for Ranking?
Search engines use opinions, and factoids to understand the consensus. News search engines use different reports, and opinions in their search results to satisfy the urgent news information needed by the newsreaders. The news search engines differentiate disinformation from information to protect the newsreaders. Google, Microsoft Bing, Yandex, and DuckDuckGo have different algorithms and prioritization for classifications of the news sources, or prioritization of the news, and newsworthy topics.
Corroboration of the Web Answers from the Open Web is a research paper from Amelia Marian and Minji Wu explaining how a search engine can rank information according to its accuracy.
Google started to explain that the Expertise-Authoriteveness-Trustworthiness is the most important group of signals to be sure that a result won't shame the search engine. Embarrassment factors for the search engines involve wrong information on a news title on the news story, or a wrong featured snippet. A search engine might be shame due to the bad result that is ranking on the SERP.
Dense-retrieval, context scoring, named entity recognition, semantic role labeling, truth ranges, fix points, confidence score, query processing, and parsing.
Context understanding requires processing the text, and tokenizing the words by recognizing the word sense. Processing the text of the news articles requires time. And, most of the time, news search engines do not have enough time for processing the text. Thus, PageRank provides a sustainable timeline for the news sources for rankings.
PageRank is a quick signal for search engines to show the authenticity of the news web source. The highly cited sources are ranked higher, and longer on the top stories. Usually, Google protects the high PageRank sources by trusting the judgment of the websites. But, fact-finding algorithms do not use PageRank mostly, unless they couldn't decide by looking at other factors, or they do not have enough resources to process the text among the hundreds of sources.
News ranking algorithms differentiate opinions, reports, and breaking news from each other. News-related entities, their co-existence, and contextual relations change. Google inventors suggest differentiation of these entities from each other for a proper news categorization.
News categorization is important to match the interested topics of the users in queryless news feeds such as Google Discover. Google Discover is a queryless news feed that serves news stories according to the users' interest areas.
An opinion for news might be misleading. Some news titles might be too harsh, or strict. Search engines use these headlines to differentiate the non-trustworthy news sources from the trustworthy ones. And, opinions of journalists or their different interpretations of the events might change the rankings of a document according to the fact-finding algorithms.
Google Sheets For SEO - Tom Pool - London SEO Meetup XLTom Pool
First delivered at London SEO Meetup XL on May 4th, 2022.
This talk focuses on a number of different tips and tricks that can be used to help improve the overall analysis of data within Google Sheets.
A number of formulae are covered within this talk, including:
REGEXMATCH & REGEXREPLACE
TRANSLATE
COUNTIF
VLOOKUP
INDEX MATCH
IFNA
The talk finally touches on the usage of the =QUERY formula, and how it can aid in a number of different situations. An example of creating a top ten dashboard is provided.
A sample Google Spreadsheet with all formulae discussed can be seen here:
https://docs.google.com/spreadsheets/d/1vMZIh6NWm8gRwIP-SkTP0N6BWEpYmFFge7FBR4Wqqtg/edit#gid=0
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
Search is undergoing dramatic changes taking it away from a focus on keywords and websites, towards conversational search and app indexes. In this presentation Distilled discusses the ways search is fundamentally changing including compound queries, implicit search signals, user signals as a ranking factor, the move from keywords to intents, and the drive towards data driven search.
My talk at the Stockholm Natural Language Processing Meetup. I explained how word2vec is implemented and how to use it in Python with gensim. When words are represented as points in space, the spatial distance between words describes a similarity between these words. In this talk, I explore how to use this in practice and how to visualize the results (using t-SNE)
Аналіз рівнів реалізуємості технічного потенціалу енергозбереження за енергот...Yurii Chernukha
Метою роботи є формування наступних алгоритмів: оцінки заходів з енергозбереження за кількісними показниками енергоефективності; аналізу рівнів реалізуємості технічного потенціалу енергозбереження за енерготехнологічними критеріями для об’єктів промислового, комерційного та житлового секторів.
Tools and tips for simplifying startup formation.Alex Shoer
Models to help you setup your startup in the right way. With an equity structure that benefits all, vesting to ensure no one runs off with equity and advisor incentives to bring in the senior experts you need.
Son todos los emprendedores colombianos de Colombia que son muy importantes para mi y para todo mi salón escolar del colegio en el que estudio que adoro y es super bueno aca en medellin
How can faculty or students join the 70,000 contributors to Wikipedia, the world's largest knowledge base?
Learn how educators can use Wikipedia in the classroom!
For more information and resources:
https://en.wikipedia.org/wiki/Wikipedia:Meetup/NYC/Fordham_October_2016
http://facultyedtechpd.wikispaces.com/Wikipedia+for+Educators
Redes socialesparaempresas - ActualizadaAdriana Alban
Como señalé en la primera versión de esta presentación, todo cambia y el enfoque con el que personalmente miro el acercamiento de las empresas ecuatorianas a la web social también ha cambiado. Hace falta un paso más entre la decisión gerencial de iniciar con presencia corporativa en Internet y la puesta en marcha de las comunidades. Hace falta el paso de la evaluación interna y externa, la planeación estratégica y la capacitación. A todo esto yo le llamo estrategia digital, es decir, tener claro qué camino se recorrerá para lograr un determinado objetivo en un tiempo señalado. Conozcan este approach y de paso, a mi como consultora y al equipo de gente en el que me apoyo.
Bridging the gap between AI and UI - DSI Vienna - full versionLiad Magen
This is a summary of the latest research on model interpretability, including Recurrent neural networks (RNN) for Natural Language Processing (NLP) in terms of what's in an RNN.
In addition, it contains suggestion to improve machine learning based user interface, to engage users and encourage them to contribute data to adapt the models to them.
ODSC East: Effective Transfer Learning for NLPindico data
Presented by indico co-founder Madison May at ODSC East.
Abstract: Transfer learning, the practice of applying knowledge gained on one machine learning task to aid the solution of a second task, has seen historic success in the field of computer vision. The output representations of generic image classification models trained on ImageNet have been leveraged to build models that detect the presence of custom objects in natural images. Image classification tasks that would typically require hundreds of thousands of images can be tackled with mere dozens of training examples per class thanks to the use of these pretrained reprsentations. The field of natural language processing, however, has seen more limited gains from transfer learning, with most approaches limited to the use of pretrained word representations. In this talk, we explore parameter and data efficient mechanisms for transfer learning on text, and show practical improvements on real-world tasks. In addition, we demo the use of Enso, a newly open-sourced library designed to simplify benchmarking of transfer learning methods on a variety of target tasks. Enso provides tools for the fair comparison of varied feature representations and target task models as the amount of training data made available to the target model is incrementally increased.
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
Talk given at the 8th Forum for Information Retrieval Evaluation (FIRE, http://fire.irsi.res.in/fire/2016/), December 10, 2016, and at the Qatar Computing Research Institute (QCRI), December 15, 2016.
Presentation of "Challenges in transfer learning in NLP" from Madrid Natural Language Processing Meetup Event, May, 2019.
https://www.meetup.com/es-ES/Madrid-Natural-Language-Processing-meetup/
Practical related work in repository: https://github.com/laraolmos/madrid-nlp-meetup
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
Review of paper
Language Models are Unsupervised Multitask Learners
(GPT-2)
by Alec Radford et al.
Paper link: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
YouTube presentation: https://youtu.be/f5zULULWUwM
(Slides are written in English, but the presentation is done in Korean)
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
This presentation delves into the world of Natural Language Processing (NLP), exploring its goal to make human language understandable to machines. The complexities of language, such as ambiguity and complex structures, are highlighted as major challenges. The talk underscores the evolution of NLP through deep learning methodologies, leading to a new era defined by large-scale language models. However, obstacles like low-resource languages and ethical issues including bias and hallucination are acknowledged as enduring challenges in the field. Overall, the presentation provides a condensed, yet comprehensive view of NLP's accomplishments and ongoing hurdles.
How can text-mining leverage developments in Deep Learning? Presentation at ...jcscholtes
How can text-mining leverage developments in Deep Learning?
Text-mining focusses primary on extracting complex patterns from unstructured electronic data sets and applying machine learning for document classification. During the last decade, a generation of efficient and successful algorithms has been developed using bag-of-words models to represent document content and statistical and geometrical machine learning algorithms such as Conditional Random Fields and Support Vector Machines. These algorithms require relatively little training data and are fast on modern hardware. However, performance seems to be stuck around 90% F1 values.
In computer vision, deep learning has shown great success where the 90% barrier has been broken in many application. In addition, deep learning also shows new successes for transfer learning and self-learning such as reinforcement leaning. Dedicated hardware helped us to overcome computational challenges and methods such as training data augmentation solved the need for unrealistically large data sets.
So, it would make sense to apply deep learning also on textual data as well. But how do we represent textual data: there are many different methods for word embeddings and as many deep learning architectures. Training data augmentation, transfer learning and reinforcement leaning are not fully defined for textual data.
This is a survey about Dialog System, Question and Answering, including the 03 generations: (1) Symbolic Rule/Template Based QA; (2) Data Driven, Learning; (3) Data-Driven Deep Learning. It also presents the available Frameworks and Datas for Dialog Systems.
Introducción a NLP (Natural Language Processing) en AzurePlain Concepts
Esta charla pretende introducir a la audiencia al mundo del procesamiento del lenguaje natural, o NLP por sus siglas en inglés (Natural Language Processing). La charla en sí constará de 3 bloques.1. Estado del arte en NLP. ¿Qué se está usando hoy en día? ¿Qué problemas podemos solucionar y qué problemas no? Técnicas comúnmente usados en la industria2. Introducción a conceptos básicos a la hora de afrontar un proyecto ML con NLP: preprocesado, vectorización y embedding (word2vec, fastText, técnicas básicas como tf-idf, counting, etc). Clasificadores.3. Pequeño ejemplo práctico con despliegue usando Azure Machine Learning.
Natural Language Processing: L01 introductionananth
This presentation introduces the course Natural Language Processing (NLP) by enumerating a number of applications, course positioning, challenges presented by Natural Language text and emerging approaches to topics like word representation.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
2. Structure of this talk
• Motivation
• Word2vec
• Architecture
• Evaluation
• Examples
• Discussion
3. Motivation
Representation of text is very important for performance of many real-world
applications: search, ads recommendation, ranking, spam filtering, …
• Local representations
• N-grams
• 1-of-N coding
• Bag-of-words
• Continuous representations
• Latent Semantic Analysis
• Latent Dirichlet Allocation
• Distributed Representations
4. Motivation: example
Suppose you want to quickly build a classifier:
• Input = keyword, or user query
• Output = is user interested in X? (where X can be a service, ad, …)
• Toy classifier: is X capital city?
• Getting training examples can be difficult, costly, and time consuming
• With local representations of input (1-of-N), one will need many
training examples for decent performance
5. Motivation: example
Suppose we have a few training examples:
• (Rome, 1)
• (Turkey, 0)
• (Prague, 1)
• (Australia, 0)
• …
Can we build a good classifier without much effort?
6. Motivation: example
Suppose we have a few training examples:
• (Rome, 1)
• (Turkey, 0)
• (Prague, 1)
• (Australia, 0)
• …
Can we build a good classifier without much effort?
YES, if we use good pre-trained features.
7. Motivation: example
Pre-trained features: to leverage vast amount of unannotated text data
• Local features:
• Prague = (0, 1, 0, 0, ..)
• Tokyo = (0, 0, 1, 0, ..)
• Italy = (1, 0, 0, 0, ..)
• Distributed features:
• Prague = (0.2, 0.4, 0.1, ..)
• Tokyo = (0.2, 0.4, 0.3, ..)
• Italy = (0.5, 0.8, 0.2, ..)
8. Distributed representations
• We hope to learn such representations so that Prague, Rome, Berlin,
Paris etc. will be close to each other
• We do not want just to cluster words: we seek representations that
can capture multiple degrees of similarity: Prague is similar to Berlin
in some way, and to Czech Republic in another way
• Can this be even done without manually created databases like
Wordnet / Knowledge graphs?
9. Word2vec
• Simple neural nets can be used to obtain distributed representations
of words (Hinton et al, 1986; Elman, 1991; …)
• The resulting representations have interesting structure – vectors can
be obtained using shallow network (Mikolov, 2007)
10. Word2vec
• Deep learning for NLP (Collobert & Weston, 2008): let’s use deep
neural networks! It works great!
• Back to shallow nets: Word2vec toolkit (Mikolov at el, 2013) -> much
more efficient than deep networks for this task
11. Word2vec
Two basic architectures:
• Skip-gram
• CBOW
Two training objectives:
• Hierarchical softmax
• Negative sampling
Plus bunch of tricks: weighting of distant words, down-sampling of frequent
words
14. Word2vec: Linguistic Regularities
• After training is finished, the weight matrix between the input and hidden layers
represent the word feature vectors
• The word vector space implicitly encodes many regularities among words:
15. Linguistic Regularities in Word Vector Space
• The resulting distributed representations of words contain
surprisingly a lot of syntactic and semantic information
• There are multiple degrees of similarity among words:
• KING is similar to QUEEN as MAN is similar to WOMAN
• KING is similar to KINGS as MAN is similar to MEN
• Simple vector operations with the word vectors provide very intuitive
results (King – man + woman ~= Queen)
16. Linguistic Regularities - Evaluation
• Regularity of the learned word vector space was evaluated using test
set with about 20K analogy questions
• The test set contains both syntactic and semantic questions
• Comparison to previous state of art (pre-2013)
20. Summary and discussion
• Word2vec: much faster and way more accurate than previous neural net
based solutions - speed up of training compared to prior state of art is
more than 10 000 times! (literally from weeks to seconds)
• Features derived from word2vec are now used across all big IT companies
in plenty of applications (search, ads, ..)
• Very popular also in research community: simple way how to boost
performance in many NLP tasks
• Main reasons of success: very fast, open-source, easy to use the resulting
features to boost many applications (even non-NLP)
21. Follow up work
Baroni, Dinu, Kruszewski (2014): Don't count, predict! A systematic
comparison of context-counting vs. context-predicting semantic vectors
• Turns out neural based approaches are very close to traditional
distributional semantics models
• Luckily, word2vec significantly outperformed the best previous
models across many tasks
22. Follow up work
Pennington, Socher, Manning (2014): Glove: Global Vectors for Word
Representation
• Word2vec version from Stanford: almost identical, but a new name
• In some sense step back: word2vec counts co-occurrences and does
dimensionality reduction together, Glove is two-pass algorithm
23. Follow up work
Levy, Goldberg, Dagan (2015): Improving distributional similarity with
lessons learned from word embeddings
• Hyper-parameter tuning is important: debunks the claims of
superiority of Glove
• Compares models trained on the same data (unlike Glove…),
word2vec is faster & vectors better & much less memory consuming
• Many others did end up with similar conclusions (Radim Rehurek, …)
24. Final notes
• Word2vec is successful because it is simple, but it cannot be applied
everywhere
• For modeling sequences of words, consider Recurrent networks
• Do not sum word vectors to obtain representations of sentences, it will not
work well
• Be careful about the hype, as always … the most cited papers often contain
non-reproducible results
25. References
• Mikolov (2007): Language Modeling for Speech Recognition in Czech
• Collobert, Weston (2008): A unified architecture for natural language processing: Deep neural networks with
multitask learning
• Mikolov, Karafiat, Burget, Cernocky, Khudanpur (2010): Recurrent neural network based language model
• Mikolov (2012): Statistical Language Models Based on Neural Networks
• Mikolov, Yih, Zweig (2013): Linguistic Regularities in Continuous Space Word Representations
• Mikolov, Chen, Corrado, Dean (2013): Efficient estimation of word representations in vector space
• Mikolov, Sutskever, Chen, Corrado, Dean (2013): Distributed representations of words and phrases and their
compositionality
• Baroni, Dinu, Kruszewski (2014): Don't count, predict! A systematic comparison of context-counting vs.
context-predicting semantic vectors
• Pennington, Socher, Manning (2014): Glove: Global Vectors for Word Representation
• Levy, Goldberg, Dagan (2015): Improving distributional similarity with lessons learned from word
embeddings