How search engine works ? - A free Guide for SEO


Published on

Every day Google answers more than one billion questions from people around the globe in 181 countries and 146 languages. 15% of the searches we see everyday we’ve never seen before. Technology makes this possible because we can create computing programs, called “algorithms” that can handle the immense volume and breadth of search requests. Google just at the beginning of what’s possible, and they are constantly looking to find better solutions. Google have more engineers working on search today than at any time in the past.

Published in: Education, Technology, Design
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

How search engine works ? - A free Guide for SEO

  1. 1. FirstPageTraffic is a team of 40 talented consultants and Online Marketers, started in 2013, having 7+ years of industrty experience. The company combines many years of experience in SEO and pay per click marketing among its consultants to provide clients with tangible results. An Advance guide By Firstpagetraffic Address : 8th Floor Manhatten, Kessel 1Valley, Plot No TZ9, Tech Zone, Greater Noida, Uttar Pradesh, India. Phone: +1-213-674-6736 E-Mail: FB: Tw: G+ :
  2. 2. Introduction The most electrifying news about internet and its most perceptible component, the World Wide Web, is that there are millions of pages existing online, which are waiting to offer information on an implausible array of topics. The pessimistic aspect of internet is that there are millions of pages available, of which, most of them are titled as per the conception of their author, and roughly all of them are sitting on the servers with their cryptic or hidden names. Whilst one needs to know about any particular topic or subject, how can they make up their mind on which page to read and what not to read? If the user is like a layman or does not have much information about the technical world, then they just go and visit an internet search engine for resolving their query.
  3. 3. Internet search engines are extraordinary web sites on the internet that are designed for serving the people in finding information and query on other websites. There is a range of variation in the ways different search engines work; though they all perform three basic tasks, like:  They survey the Internet or choose sections of the Internet that are based on essential words.  They keep a directory of the words where and when to find.  They let the users look for keywords or combination of words found in that directory. Originally, search engines are used for holding an index of a few hundred pages and receive conceivably two or three hundred inquiries every day. Currently, a foremost search engine indexes hundreds of millions of pages, and respond to trillions of queries every day. This PDF will tell you how these main job is performed, also how the internet search engines put the pieces in concert for finding the information one requires on the internet. About 240 million people in the United States habitually use Internet, and last year their activity produced almost $170 billion in commerce, together with online transactions and online advertising.
  4. 4. Approximately 240 million people in the United States habitually make use of Internet and preceding year, their action generated almost $170 billion in commerce, which includes online transactions and online advertising. How search works? 1. A search engine sends “bots” or Web crawlers for copying Web sites in the direction of creating an index of every thing present on the Internet. 2. While entering a query in the search engine, the engine actually looks through the index which it has created, relatively than the Web itself. It enables search engines in delivering results swiftly. Frequently in less than just one second, one is returned listings for both “natural” search results, and “paid results,” which are delivered along with the natural search results based on the query. As per the listing, the search could return hundreds of page results. 3. When one types in search term, the search engine makes use of proprietary algorithm for organizing and prioritizing results which it identifies as expected to be pertinent to
  5. 5. the query. Alterations to those algorithms could greatly impact a website’s prospect for the success, since they could conclude whether a site is ranked high or low in reply to a specific query. 4. Since algorithms are very complex, hence it becomes utmost difficult to decide when modifications are done to them and what those alterations are. This absence of precision means that algorithms could be programmed for excluding, penalizing or promoting particular websites or whole site categories. Whether one is information seller, buyer or seeker, one could find each other on the vast internet in search. Being visible in search results is necessary for contributing in Internet commerce and discussion. THE LEADING 3 RESULTS OBTAIN 88% OF USERS’ CLICKS
  6. 6. GOOGLE DOMINATES THE WEB SEARCH Google governs the search by directing over more than 79% to 94% in some EU countries. of search in U.S. and up
  7. 7. The Latest In Search Engine
  8. 8. Building SEO-Focused Pages For Serving Topics & People Rather than Keywords & Rankings With updates such as Hummingbird, Google is excelling in search every passing day at deciding what's pertinent to you and what you're searching for. This can truly help the SEO work, since it means they don't need to concentrate quite so closely on definite keywords. Yes. SEO has become more complex. It's become harder. There's a little bit of in-association from just the keyword and ranking.
  9. 9. Google Has Restored Its Search For Answering Long Question Better On its 15th anniversary, Google has updated its central algorithm which controls the solutions one get to queries on the search engine in a proposal for making them work better for huge, more complicated questions. The update, code-named Hummingbird, is the major change for reinforcing the world’s foremost search engine ever since early 2010, while Google upgraded its algorithm to Caffeine. Google made the alteration about a month before, it declared at a press event in Menlo Park (Calif)’s garage where Google commenced. The occasion also commemorated the 15th birthday of Google’s beginning. The majority of the people won’t observe an explicit difference to search results. However with more and more persons giving more difficult queries, particularly as they could more and more speak their searches in to their Smartphone, there’s a want for new mathematical formulas for handling them.
  10. 10. Keyword Ranking Factors
  12. 12. How Search Engine Works These processes lay the underpinning — they're how we congregate and arrange information on the internet thus they could return the most valuable results to the user. The directory is well over 10,000,000 gigabytes, and they’ve spent over one million calculate hours for building it. About Search Each day Google answers more than one billion queries from people round the world in 146 languages and 182 countries. 16% of the search they witness everyday they’ve never seen before. Technology has made it probable since they could create calculating programs, known “algorithms” which could handle the massive volume and breadth of search requirements. Google is just at the beginning of what’s feasible, and they are constantly looking to find better solutions. They have more engineers working on search these days than at any time in the past. Search relies on human creativity, perseverance and determination. Google’s search engineers propose algorithms for returning high-quality, timely, on-topic, solutions to people’s questions.
  13. 13. 1. CRAWLING Find information by crawling Google make use of software known as “web crawlers” for discovering publicly available web pages. The most eminent crawler is known as “Googlebot.” Crawlers come across the web pages and pursue links on those pages; much like one would if one were browsing content on the internet. They visit from one link to link and fetch data with reference to those web pages back to Google’s servers. The crawl process initiates with a catalog of web addresses from past crawls and sitemaps offered by website owners. Since the crawlers visit these websites, they search for the links for other pages to visit. The software pays special consideration to new sites, changes to present sites and dead links. Computer programs decide which sites to crawl upon, how frequently, and how many pages needs to be obtained from each site. Google doesn't accept imbursement for crawling a site more often for the web search results. All a search engine concern more about having the best probable results since in the long run that’s what’s best for users and, thus, the business. Systematizing information by indexing The internet is like an always-growing public library with millions of books and no essential filing system. Google fundamentally collects the pages throughout the crawl process and then forms an index, thus the search engine know exactly how to perceive things up. A lot like the index in the back of a book, the Google index comprises information about the words and their locations. While one searches, at the most fundamental level, the algorithms look up the search terms in the directory for finding the suitable pages. The search process gets much more complicated from there. While one searches for “cats” one doesn’t want a page with the word “cats” on it hundreds of times. One perhaps wants videos, images, or a catalog of breeds. Google’s indexing systems note several different facets of pages, like when they were printed, whether they include videos and pictures, and many more. With
  14. 14. the Knowledge Graph, the search engine is enduring to go beyond keyword matching to better understanding the people, places and things one cares about. Choice for website owners Most of the websites don’t need to set up limitations for crawling, indexing or serving, thus their pages are entitled for appearing in search results without doing much extra work. Having said that, website owners have various choices about how Google crawls and indexes their sites during Webmaster Tools and a file known as “robots.txt”. By robots.txt file, site owners could opt not to be crawled by Googlebot, or they could offer more particular instructions about how to process pages on their websites. Site owners have coarse choices and could select how content is indexed on a page-by-page source. For instance, they can choose to have their pages show without a snippet or a cached version. Webmasters could also opt to incorporate search into their own pages with Custom Search.
  15. 15. Latest In Web Crawling Google Webmaster Tools Now Locates Smartphone Crawl Errors It could be quite complicated for websites with a massive number of Smartphone visitors for figuring out topics such as 404 errors while only Smartphone visitors or desktop users might be affected. Frequently, users don’t comprehend there is a problem since the bulk of the time, they’re doing troubleshooting and maintenance from a desktop; they simply don’t perceive the mobile concerns unless someone exclusively alerts them. Google Webmaster Tools realizes this is an concern, particularly with mobile traffic mounting at such a speedy rate. They’ve made some alterations to their crawl errors page for include specific Smartphone crawl errors which the Googlebot-Mobilebot realizes while crawling the web like a mobile useragent. Pierre Far, a webmaster trends analyst, has proclaimed that webmasters could now find a wide variety of crawl information and errors for Smartphone:     Server errors: A server error is once Googlebot got an HTTP error status code whilst it crawls the page. Not found errors and soft 404s: A page can demonstrate a "not found" message to Googlebot, each by returning an HTTP 404 status code or while the page is noticed as a soft error page. Faulty redirects: A faulty redirect is a Smartphone-specific error which occurs when a desktop page redirects Smartphone users to a page which is not pertinent to their query. A distinctive example is when all pages on the desktop site transmit Smartphone users to the Smartphone-optimized site’s homepage. Blocked URLs: A blocked URL is when the site's robots.txt clearly forbids crawling by Googlebot for Smartphone. Normally, such Smartphone-specific robots.txt prohibits directives are invalid. One should examine the server configuration if one sees blocked URLs reported in Webmaster Tools. The mobile crawlers are by now live in Webmaster Tools. Simply log into the account, click on “Crawl Errors” in “Crawl” submenu, and choose the Smartphone tab for viewing any crawl errors from the website.
  16. 16. 2. ALGORITHMS For a usual query, there are hundreds, if not billions, of web pages with useful information. Algorithms are the computer processes and formulas that take the queries and turn them into solutions. Today Google’s algorithms depend on more than 300 unique signals or “clues” which make it possible for guessing what one may be really looking for. These signals comprise things like the terms on web sites, the uniqueness of content, the region and Page Rank.
  17. 17. For each search query performed on Google, whether it’s [hotels in Cambridge] or [Cricket scores], there are hundreds, if not billions of web pages with useful information. The confrontation in search is to return only the most pertinent results at the topmost page, sparing people from brushing during the less appropriate results below. Not each website could appear at the top of the page, or even show on the first page of the search results. Nowadays the algorithms depend on more than 300 unique signals, some of which one had expected, such as, how frequently the search terms appear on the webpage, if they show in the title or whether synonyms of the search terms crop up on the page. Google has done various innovations in search to developing the answers one find. The primary and most reputed is Page Rank, named for Larry Page (Google’s co-founder and CEO). Page Rank mechanism is counting the number and worth of links to a page to decide a coarse estimate of how significant the web site is. The fundamental supposition is that more vital websites are probable to receive more links from other web sites. “[Google] has every reason to do whatever it takes to conserve its algorithm’s longstanding reputation for distinction. If customers start to regard it as anything less than good, it won’t be good for anyone—except other search engines.” Harry McCracken, TIME, 3/3/2011
  18. 18. 3. Fighting Spam sites endeavor to game their way to the leading search results throughout methodologies, such as, repeating keywords again and again, buying links which pass PageRank or placing invisible text on the screen. This is awful for search since appropriate websites get covered, and it’s bad for genuine website owners since their sites turn out to be difficult to find. The superior news is that Google's algorithms could notice the huge majority of spam and downgrade it robotically. For the rest, search engine has teams who physically review sites. Identifying Spam Spam sites come in various sizes and shapes. Some websites are robotically-generated garbage which no human could make logic of. Certainly, search engine also visualizes websites by making use of subtle spam method. Check out these examples of “pure spam,” which are sites using the most insistent spam techniques. This is a torrent of live spam screenshots which search engine has physically identified and in recent times detached from emerging in search results. Types of spam There are various other kinds of spam that search engine detects and take action on.        Parked domains Cloaking and/or sneaky redirects Spam free hosts and dynamic DNS providers Hidden text and/or keyword stuffing Pure spam Hacked site Thin content with little or no added value
  19. 19.    Unnatural links from a site User-generated spam Unnatural links to a site Taking action While the algorithms address the immense majority of spam, the search engine address other spam manually for preventing it from affecting the worth of the results. This chart or graph shows the number of domains which have been affected by a manual action over time and is busted down by the diverse spam kinds. The numbers might look huge out of context; however the web is actually a big place. A latest snapshot of the index showed that about 0.22% of domains had been physically marked for elimination. Manual action by month
  20. 20. Notifying website owners When the search engine takes manual action on a web site, search engine professionals try to aware the site's owner to help them deal with the issues. Search engine website owners for getting the information they need for getting their websites in shape. That is why, over time, the search engine have invested considerable resources in webmaster communication and outreach. The subsequent graph demonstrates the number of spam notifications sent to website owners through Webmaster Tools. Messages By Month
  21. 21. Listening For Feedback Manual actions don’t last perpetually. Once a website owner cleans up their site for removing spammed content, they could ask for reviewing the site again by filing a reassessment request. The search engine processes all the re-evaluation requests they receive and converse along the way for letting website owners know how it's going. Traditionally, most sites which have proposed reconsideration requests are not in fact affected by any of the manual spam action. Frequently these sites are just experiencing the usual ebb and flow of online traffic, an algorithmic alteration, or possibly technical problems avoids Google from accessing site content. This chart demonstrates the weekly volume of reconsideration requests since 2006. Reconsiderations request By Week
  22. 22. Access to Information Comes First We trust in free expression and the free pour of information. Search engines try hard in making information accessible excluding narrowly distinct cases such as, spam, legal requirements, malware and checking identity theft.
  23. 23. Algorithms Over Manual Action The significance and comprehensiveness of the search results is central for helping one finding what one is searching for. Search engines favor machine solutions to physically arrange information. Algorithms are scalable, hence when one makes an upgrading, it makes things improved not just for one search results page, however for hundreds or billions. Though, there are some cases where search engine use manual controls whilst machine solutions aren’t enough. Exceptions Lists Like the majority of search engines, in some cases, the algorithms untruly identify sites and search engine make limited exceptions for improving the search quality. For instance, the SafeSearch algorithms are designed for protecting children from adult content online. Whilst one of these algorithms not identifies websites (for instance search engine at times make manual exceptions for preventing these sites from being regarded as pornography.
  24. 24. Fighting Spam and Malware Search engine hate spam as much as users do. It hurts users by messing search results with unrelated links. Search engine have teams which work for detecting spam websites and eliminating them from the results. The same applies to malware and phishing websites. Transparency for Webmasters Search engines have clear Webmaster Guidelines demonstrating best practices and spam behavior. When the manual spam panel takes action on a website and it may openly affect that website’s ranking, search engines try their best for alerting the webmaster. If the search engine takes manual action, webmasters could correct the problem and file a review request.
  25. 25. Preventing Identity Theft Upon request, search engine eradicates personal information from search results if they believe it could make one vulnerable to definite harm, like, financial fraud or identity theft. This contains sensitive government ID numbers such as U.S. Social Security Numbers, credit card numbers, bank account numbers, and images of signatures. They usually don’t process elimination of national ID numbers from official government websites as in those cases they deem the information to be public. They at times decline requests if they believe someone is trying to mistreat these policies for removing other information from the results. Legal Removals At time, search engine remove content or features from the search results for official reasons. For instance, they will remove content if they receive valid notification under the Digital Millennium Copyright Act (DMCA) in the US. They also eliminate content from local versions of Google constant with local law, as they’re notified that content is at question. For instance, they’ll remove content which unlawfully adores some party on or that illegally abuses religion on When they remove content from the search results for lawful reasons, they show a notification that results have been removed, and they report these removals to, a venture run by the Berkman Center for Internet and Society, which follows online restrictions on speech.
  26. 26. Fighting Child Exploitation Search engines chunk search results which lead to child sexual abuse images. This is a lawful requirement and the correct thing to do. Shocking Content Search engine wishes to ensure information is obtainable while one is searching for it, however they also need to be cautious not to show probably upsetting content when one has not asked for it. Consequently, they may not trigger definite search features for queries where the results could be nasty in various narrowly defined categories.
  27. 27. Safe Search Whilst it comes to information on the internet, they leave it on users to decide, as in, what is worth finding. That’s why they have a Safe Search filter, which gives more power over the search experience by helping in avoiding adult content if you’d relatively not see it.
  28. 28. Future search The searches stated by Boolean operators are literal searches -- the engine searches for the phrases or words just as they are entered. This could be a problem whilst the entered words have numerous meanings. "Bed," for instance, could be a place of sleeping, a place where flowers are grown, the storeroom space of a truck or a place where fishes lay their eggs. If the user is interested in only one of these meanings, one may not wish to see pages attributing others. One of the regions of search engine research is concept-based searching. Few of this research contains using statistical analysis on pages consisting the phrases or words one searches for, for finding other pages one may be interested in. Evidently, the information stored about every page is superior for a concept-based search engine, and far more processing is necessary for every search. Even though, various groups are working for improving both results and performance of this kind of search engine. Others have shifted to another region of research, known as natural-language queries. The scheme behind natural-language queries is that one can’t type a question in the similar way one would ask it to a person sitting beside; there is no requirement of keeping track of complicated query structures or Boolean operators.
  29. 29. SEO In 2014: How to Prepare for Google's 2014 Algorithm Updates         Everything learned in 2013 is still pertinent, just enlarged Content Marketing has gone wider than ever Social Media plays an all the time more visible role Invest in Google+ Hummingbird was just the tip of the mobile iceberg The long versus short debate Marketing and PPC has a shifted relationship with SEO Guest blogging remains the most efficient tactics, with a caveat
  30. 30. Resources 1. 2. 3. 4. 5. Need Help to enhance your Visibility ? Feel Free to write us. We will help you to enhance your search visibility and also In brand Building. Shoot an Email to us – or Skype us at firstpagetraffic