Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tolmachev Alexander Web Search Engines

1,690 views

Published on

A brief overview about how web search engines work

Published in: Technology, Design
  • Be the first to comment

  • Be the first to like this

Tolmachev Alexander Web Search Engines

  1. 1. Web search engines Alexander Tolmachev gr. #3057/2
  2. 2. Contents Introduction: what do web search engines mean for us today? History of web search engines How web search engines work Most popular search engines Conclusion: past, present and future of web search 2
  3. 3. Contents➔ Introduction: what do web search engines mean for us today? History of web search engines How web search engines work Most popular search engines Conclusion: past, present and future of web search 3
  4. 4. The Web as a huge storage ofinformation A huge amount of information is contained in the Word Wide Web And this amount is still growing day by day We need to orient ourself in this enormous information space Web search engines provide us fast search of information that we are interested in 4
  5. 5. Web search engines in our life We use web search engines every day for:  Searching texts, articles, books, news, etc.  Searching different media: music, videos, films, pictures, etc.  Searching goods  Searching web sites and web portals  Preparing lectures and presentations ☺  … The verb “to google” is included in dictionaries Web search engines have become an integral part of our life 5
  6. 6. Contents✔ Introduction: what do web search engines mean for us today?➔ History of web search engines How web search engines work Most popular search engines Conclusion: past, present and future of web search 6
  7. 7. The very first search tools 1989–1991 – the invention of the World Wide Web by Sir Tim Berners-Lee in CERN Archie (1990)  The first Internet search tool  Fetching and indexing files on FTP servers  Providing search for indexed files Veronica and Jughead – similar to Archie search tools for Gopher protocol invented in 1991 7
  8. 8. The first web search engines W3Catalog (1993)  The first primitive search engine  Mirroring and integration of manually maintained catalogues  Still available: http://www.w3catalog.com/ World Wide Web Wanderer (1993)  The first web crawler  The first web index called Wandex  Aimed to count Web size, not to serve as a search tool 8
  9. 9. The first web search engines JumpStation (1993)  The first web search engine combining crawling, indexing and searching  A web form for search queries  No ranking, just listing search results Excite (1994)  The first ranking system WebCrawler (1994)  Indexing full text  The first widely known web search engine 9
  10. 10. Web search evolution 1994–1997 – a number of similar web search engines:  Infoseek  OpenText  Magellan  Inktomi  Northern Light  AskJeeves  AltaVista 10
  11. 11. Web search evolution Yahoo! (1994)  Search in human edited hierarchical web directory  Manual solution of relevancy  Search by keywords as well as browsing full directory  Gained large popularity  Later in 2004 developed its own web search engine  One of the main stars in business world in 1990s 11
  12. 12. Web search evolution Google (1998)  The invention of Page Rank  Simple and clear interface instead of turning to a web portal Yandex (1997)  Full-text search with Russian morphology support  Quickly gained large popularity in Russia 12
  13. 13. Web search engines today Powerful web search technologies  Maximal freshness of results  Variety of types of searchable documents  Intelligent algorithms of ranking Media search:  Images  Music  Videos  … 13
  14. 14. Web search engines today Personalized search  Based on users search history  Based on personal information from virtual social spaces Location-based search Vertical search Image-based search Audio-based search 14
  15. 15. Contents✔ Introduction: what do web search engines mean for us today?✔ History of web search engines➔ How web search engines work Most popular search engines Conclusion: past, present and future of web search 15
  16. 16. Basic principles of web search Create and sort a pool of data Find the most appropriate information Deliver this information 16
  17. 17. Basic parts of web search engine A web spider/crawler/robot – a computer program which:  Continuously traverses web pages  Finds new or changed content  Stores visited pages in corpus Index – a database containing crawling results Search engine – a computer program which:  Identifies pages relevant to search query  Retrieve this pages  Rank them User interface 17
  18. 18. Web crawling Web crawling is aimed to traverse web pages and to store their copies for further indexing General web crawler algorithm:  Starts with a list of initial URLs, called the seeds  Visits these URLs  Retrieves required information from the page  Identifies all the hyper-links on the page  Adds this links to the queue of URLs, called the crawl frontier  Recursively visit URLs from the crawl frontier 18
  19. 19. Web crawler architecture 19
  20. 20. Crawling policies A selection policy  Focused crawling  Restricting followed links  URL normalization  Path-ascending crawling A re-visit policy  Uniform policy  Proportional policy A politeness policy A parallelization policy 20
  21. 21. Indexing Indexing is purposed to provide high speed and performance in finding relevant documents in corpus for a search query. For example 10,000 documents:  Queried within milliseconds with the help of index  Sequential scan could take hours Meta search engines reuse the indices of other services and do not store a local index  E.g. vertical search can use indices of vertical services 21
  22. 22. Inverted index For each word stores a list of documents containing this word Provides direct access to the documents associated with each word in the search query Commonly used by web search engines Not convenient to update 22
  23. 23. Forward index Stores a list of words for each document Its more handy to store words per document immediately during its parsing Enables asynchronous processing – mush easy to update then inverted index Is stored to be transformed to inverted index 23
  24. 24. Ranking Ranking is an arrangement of web search results in order of relevance Usually based on statistical methods  Frequency of keywords in particulat document  Rating page popularity and authority Advanced search engines also use intelligent algorithms of ranking 24
  25. 25. Google PageRank PageRank was invented in 1998 by Larry Page and Sergey Brin at Stanford University It is aimed to rate web page authority relatively to other web pages Basic principles:  A hyperlink to a page counts as a vote of support  Page with high number of incoming links has high authority  A hyperlink coming from authoritative web page gives more points PR(p) is a probability that a person randomly clicking on links will arrive at page p 25
  26. 26. Google PageRank A B C D 0.25 0.25 0.25 0.25 A B C D 1/2 1/6 1/6 1/6 A B C D 6/17 2/17 3/17 6/17 26
  27. 27. Google PageRank So, PageRank of page A: In the general case, the PageRank value for any page u: where Bu – set containing all pages linking to page u; L(v) – number of links from page v. 27
  28. 28. Google PageRank Spider traps: A B C Damp factor  d – probability that random surfer continue traversal  (1-d) – probability of going to random site The result formula: 28
  29. 29. Web Search Engine Architecture 29
  30. 30. Contents✔ Introduction: what do web search engines mean for us today?✔ History of web search engines✔ How web search engines work➔ Most popular search engines Conclusion: past, present and future of web search 30
  31. 31. Google Was started in 1996 as the research project of Larry Page and Sergey Brin in Stanford University Was launched in 1998 By the end of 1998 already had an index of about 60 million pages Quickly gained popularity due to PageRank algorithm 31
  32. 32. Google Today Google is the most popular web search engine in the world: 85% of web search market Provides many other services:  Gmail  Google maps  Google+  … Has its own OS – Android Provides web browser – Google Chrome ... 32
  33. 33. Yandex Was founded in 1997 by Arkady Volozh and Ilya Segalovich The first web search engine providing morphological search The prototype of Yandex search engine was a system for autimated searching in Bible The name stand for “Yet Another iNDEXer” 33
  34. 34. Yandex In 1998 Yandex launched contextual advertisement In 2001 Yandex.Direct was launched - an automated, auction-based system for placement of text-based advertising 2005 – Ukraine portal, www.yandex.ua 2008 – Yandex Labs in San Francisco Bay area 2010 – English version of web search engine 2011 - search engine and a range of other services in Turkey, at yandex.com.tr 34
  35. 35. Yandex 35
  36. 36. Yandex today 63% of Russian web search market More than 3500 employees 24 offices in 8 countries 36
  37. 37. Contents✔ Introduction: what do web search engines mean for us today?✔ History of web search engines✔ How web search engines work✔ Most popular search engines➔ Conclusion: past, present and future of web search 37
  38. 38. Conclusion Web search engines are an integral part of our life today They did a long way before they reached todays performance and power Their development is far from being finished Main developing trends are:  Web search personalization  Local-based search  Vertical search 38
  39. 39. Your questions, please 39
  40. 40. Thank you for your time! 40

×