Search Quality implemantation
“Analyzing SERPs Ranking All
Over the World
on Google”
-Anil Kumar Sah(072BEX450)
IOE, Pulchowk Campus
Truly,
“Search was Big Data before there was Big Data”
• Search has always been concerned with
• extremely large datasets, and
• statistical analysis of those sets, both for indexing (i.e. large scale batch
processing) as well as at query-time (i.e. high speed real-time
processing).
• Billion-document databases have existed in search engines for decades.
• Search has always lacked is a good, inclusive framework. Do we have
that framework now? No. But we have the vision of the framework, and
it lives on Big Data.
But what would we do with such a framework?
• Things like:
•Link Counting / Page Rank / Anchor Text / Popularity
• Were too hard, too expensive, and too unreliable to be used except in
complex, hand-crafted implementations.
• These are all techniques using external references to improve search
relevancy.
• Link counting uses inbound links into a document (i.e. links from other
documents) to boost relevancy.
• Google's "PageRank" has the same goal, but is mathematically more
sophisticated.
• "Anchor text" can influence relevancy by taking into account how other
people reference your document.
• Popularity boosts documents that are known to be more popular based
on how often those documents are clicked
•These algorithms become more than just possible; they
are a natural evolution. After all, Google invented
Map/Reduce specifically to handle Page Rank
calculations for Google.com.
•Because of the lack of an appropriate framework that all
of this incredibly valuable external data is underused in
most search implementations.
• Search Engine Results Pages (SERP) : Pages displayed by search
engines in response to a query by a searcher.
• Main component of the SERP :listing of results that are returned by
the search engine in response to a keyword query,
• Pages may also contain other results such as advertisements.
• Two general types:
• Organic search (i.e., retrieved by the search engine's algorithm)
• Sponsored search (i.e., advertisements)
• Several pages in response to a single search query, Due to the huge
number of items that are available.
What kind of pages are viewed as high quality?
• Which factors influence high-quality and low-quality ratings (SUPER
important, as these factors may be similar to how Google measures
page quality for SERP rankings)
• Organic SERP listings are the natural listings generated by search
engines based on a series of metrics that determines their
relevance to the searched term.
• Webpages that score well on a search engine's algorithmic test
show in this list
Main metrices
• 1. Count: the number of times that the domain appeared in the
searches that we made.
• 2. Mean: the mean (average) rank of each of the domains.
• 3. Coverage: the count divided by the number of queries.
keywords
• Total appearances: the number of times the domain appeared in
SERPs (top 10)
• Coverage: total appearances divided by the total queries (shown as a
percentage)
• Average position (rank)of each of the domains
• Keywords: "coffee“ and “cafe”
• The requests were run in all available countries, many of which don't
speak English, or the languages that have "cafe".
• Country weights: probably the most important issue. The
visualization and counting assumes all countries are equal in value. They
are not, whether in terms of population, GDP, coffee consumption etc.
gl parameter of the adv.serp_goog function stands for
"geo-location". gl are available as a dictionary
As you can see, the number of domains ranking for "cafe" is almost 2.5 times that of
"coffee". It makes sense, because the former is more of a local keyword, and the geo-
location makes a difference. So Google is probably giving more local domains for each
country. On the other hand "coffee" is a generic term about the plant/drink so the
location doesn't play an important role.
Summary of the number of appearances rank_count and
the average position rank_mean for the domains that are in
our top_df(DataFrame filtered by having only results that are
in top_domains
For queries “coffee”
Wikipedia's average rank is higher, but what about
the number of appearnces and coverage?
Have any queries?
Don’t ask me .
Search yourself
References [1]: https://www.semrush.com/blog/analyzing-search-engine-results-
pages/
References [2]: https://www.kaggle.com/eliasdabbas/coffee-and-cafe-search-engine-
rankings-on-google
Thank you

Analyzing search engine results pages(SERPs) All over the worlds

  • 1.
    Search Quality implemantation “AnalyzingSERPs Ranking All Over the World on Google” -Anil Kumar Sah(072BEX450) IOE, Pulchowk Campus
  • 2.
    Truly, “Search was BigData before there was Big Data” • Search has always been concerned with • extremely large datasets, and • statistical analysis of those sets, both for indexing (i.e. large scale batch processing) as well as at query-time (i.e. high speed real-time processing). • Billion-document databases have existed in search engines for decades.
  • 3.
    • Search hasalways lacked is a good, inclusive framework. Do we have that framework now? No. But we have the vision of the framework, and it lives on Big Data. But what would we do with such a framework? • Things like: •Link Counting / Page Rank / Anchor Text / Popularity • Were too hard, too expensive, and too unreliable to be used except in complex, hand-crafted implementations.
  • 4.
    • These areall techniques using external references to improve search relevancy. • Link counting uses inbound links into a document (i.e. links from other documents) to boost relevancy. • Google's "PageRank" has the same goal, but is mathematically more sophisticated. • "Anchor text" can influence relevancy by taking into account how other people reference your document. • Popularity boosts documents that are known to be more popular based on how often those documents are clicked
  • 5.
    •These algorithms becomemore than just possible; they are a natural evolution. After all, Google invented Map/Reduce specifically to handle Page Rank calculations for Google.com. •Because of the lack of an appropriate framework that all of this incredibly valuable external data is underused in most search implementations.
  • 6.
    • Search EngineResults Pages (SERP) : Pages displayed by search engines in response to a query by a searcher. • Main component of the SERP :listing of results that are returned by the search engine in response to a keyword query, • Pages may also contain other results such as advertisements. • Two general types: • Organic search (i.e., retrieved by the search engine's algorithm) • Sponsored search (i.e., advertisements)
  • 7.
    • Several pagesin response to a single search query, Due to the huge number of items that are available. What kind of pages are viewed as high quality? • Which factors influence high-quality and low-quality ratings (SUPER important, as these factors may be similar to how Google measures page quality for SERP rankings)
  • 8.
    • Organic SERPlistings are the natural listings generated by search engines based on a series of metrics that determines their relevance to the searched term. • Webpages that score well on a search engine's algorithmic test show in this list
  • 9.
    Main metrices • 1.Count: the number of times that the domain appeared in the searches that we made. • 2. Mean: the mean (average) rank of each of the domains. • 3. Coverage: the count divided by the number of queries.
  • 10.
    keywords • Total appearances:the number of times the domain appeared in SERPs (top 10) • Coverage: total appearances divided by the total queries (shown as a percentage) • Average position (rank)of each of the domains
  • 11.
    • Keywords: "coffee“and “cafe” • The requests were run in all available countries, many of which don't speak English, or the languages that have "cafe". • Country weights: probably the most important issue. The visualization and counting assumes all countries are equal in value. They are not, whether in terms of population, GDP, coffee consumption etc.
  • 12.
    gl parameter ofthe adv.serp_goog function stands for "geo-location". gl are available as a dictionary
  • 13.
    As you cansee, the number of domains ranking for "cafe" is almost 2.5 times that of "coffee". It makes sense, because the former is more of a local keyword, and the geo- location makes a difference. So Google is probably giving more local domains for each country. On the other hand "coffee" is a generic term about the plant/drink so the location doesn't play an important role.
  • 15.
    Summary of thenumber of appearances rank_count and the average position rank_mean for the domains that are in our top_df(DataFrame filtered by having only results that are in top_domains
  • 16.
  • 17.
    Wikipedia's average rankis higher, but what about the number of appearnces and coverage?
  • 21.
    Have any queries? Don’task me . Search yourself References [1]: https://www.semrush.com/blog/analyzing-search-engine-results- pages/ References [2]: https://www.kaggle.com/eliasdabbas/coffee-and-cafe-search-engine- rankings-on-google
  • 22.