Tolmachev Alexander Web Search Engines

1,150 views

Published on

A brief overview about how web search engines work

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,150
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Tolmachev Alexander Web Search Engines

  1. 1. Web search engines Alexander Tolmachev gr. #3057/2
  2. 2. Contents Introduction: what do web search engines mean for us today? History of web search engines How web search engines work Most popular search engines Conclusion: past, present and future of web search 2
  3. 3. Contents➔ Introduction: what do web search engines mean for us today? History of web search engines How web search engines work Most popular search engines Conclusion: past, present and future of web search 3
  4. 4. The Web as a huge storage ofinformation A huge amount of information is contained in the Word Wide Web And this amount is still growing day by day We need to orient ourself in this enormous information space Web search engines provide us fast search of information that we are interested in 4
  5. 5. Web search engines in our life We use web search engines every day for:  Searching texts, articles, books, news, etc.  Searching different media: music, videos, films, pictures, etc.  Searching goods  Searching web sites and web portals  Preparing lectures and presentations ☺  … The verb “to google” is included in dictionaries Web search engines have become an integral part of our life 5
  6. 6. Contents✔ Introduction: what do web search engines mean for us today?➔ History of web search engines How web search engines work Most popular search engines Conclusion: past, present and future of web search 6
  7. 7. The very first search tools 1989–1991 – the invention of the World Wide Web by Sir Tim Berners-Lee in CERN Archie (1990)  The first Internet search tool  Fetching and indexing files on FTP servers  Providing search for indexed files Veronica and Jughead – similar to Archie search tools for Gopher protocol invented in 1991 7
  8. 8. The first web search engines W3Catalog (1993)  The first primitive search engine  Mirroring and integration of manually maintained catalogues  Still available: http://www.w3catalog.com/ World Wide Web Wanderer (1993)  The first web crawler  The first web index called Wandex  Aimed to count Web size, not to serve as a search tool 8
  9. 9. The first web search engines JumpStation (1993)  The first web search engine combining crawling, indexing and searching  A web form for search queries  No ranking, just listing search results Excite (1994)  The first ranking system WebCrawler (1994)  Indexing full text  The first widely known web search engine 9
  10. 10. Web search evolution 1994–1997 – a number of similar web search engines:  Infoseek  OpenText  Magellan  Inktomi  Northern Light  AskJeeves  AltaVista 10
  11. 11. Web search evolution Yahoo! (1994)  Search in human edited hierarchical web directory  Manual solution of relevancy  Search by keywords as well as browsing full directory  Gained large popularity  Later in 2004 developed its own web search engine  One of the main stars in business world in 1990s 11
  12. 12. Web search evolution Google (1998)  The invention of Page Rank  Simple and clear interface instead of turning to a web portal Yandex (1997)  Full-text search with Russian morphology support  Quickly gained large popularity in Russia 12
  13. 13. Web search engines today Powerful web search technologies  Maximal freshness of results  Variety of types of searchable documents  Intelligent algorithms of ranking Media search:  Images  Music  Videos  … 13
  14. 14. Web search engines today Personalized search  Based on users search history  Based on personal information from virtual social spaces Location-based search Vertical search Image-based search Audio-based search 14
  15. 15. Contents✔ Introduction: what do web search engines mean for us today?✔ History of web search engines➔ How web search engines work Most popular search engines Conclusion: past, present and future of web search 15
  16. 16. Basic principles of web search Create and sort a pool of data Find the most appropriate information Deliver this information 16
  17. 17. Basic parts of web search engine A web spider/crawler/robot – a computer program which:  Continuously traverses web pages  Finds new or changed content  Stores visited pages in corpus Index – a database containing crawling results Search engine – a computer program which:  Identifies pages relevant to search query  Retrieve this pages  Rank them User interface 17
  18. 18. Web crawling Web crawling is aimed to traverse web pages and to store their copies for further indexing General web crawler algorithm:  Starts with a list of initial URLs, called the seeds  Visits these URLs  Retrieves required information from the page  Identifies all the hyper-links on the page  Adds this links to the queue of URLs, called the crawl frontier  Recursively visit URLs from the crawl frontier 18
  19. 19. Web crawler architecture 19
  20. 20. Crawling policies A selection policy  Focused crawling  Restricting followed links  URL normalization  Path-ascending crawling A re-visit policy  Uniform policy  Proportional policy A politeness policy A parallelization policy 20
  21. 21. Indexing Indexing is purposed to provide high speed and performance in finding relevant documents in corpus for a search query. For example 10,000 documents:  Queried within milliseconds with the help of index  Sequential scan could take hours Meta search engines reuse the indices of other services and do not store a local index  E.g. vertical search can use indices of vertical services 21
  22. 22. Inverted index For each word stores a list of documents containing this word Provides direct access to the documents associated with each word in the search query Commonly used by web search engines Not convenient to update 22
  23. 23. Forward index Stores a list of words for each document Its more handy to store words per document immediately during its parsing Enables asynchronous processing – mush easy to update then inverted index Is stored to be transformed to inverted index 23
  24. 24. Ranking Ranking is an arrangement of web search results in order of relevance Usually based on statistical methods  Frequency of keywords in particulat document  Rating page popularity and authority Advanced search engines also use intelligent algorithms of ranking 24
  25. 25. Google PageRank PageRank was invented in 1998 by Larry Page and Sergey Brin at Stanford University It is aimed to rate web page authority relatively to other web pages Basic principles:  A hyperlink to a page counts as a vote of support  Page with high number of incoming links has high authority  A hyperlink coming from authoritative web page gives more points PR(p) is a probability that a person randomly clicking on links will arrive at page p 25
  26. 26. Google PageRank A B C D 0.25 0.25 0.25 0.25 A B C D 1/2 1/6 1/6 1/6 A B C D 6/17 2/17 3/17 6/17 26
  27. 27. Google PageRank So, PageRank of page A: In the general case, the PageRank value for any page u: where Bu – set containing all pages linking to page u; L(v) – number of links from page v. 27
  28. 28. Google PageRank Spider traps: A B C Damp factor  d – probability that random surfer continue traversal  (1-d) – probability of going to random site The result formula: 28
  29. 29. Web Search Engine Architecture 29
  30. 30. Contents✔ Introduction: what do web search engines mean for us today?✔ History of web search engines✔ How web search engines work➔ Most popular search engines Conclusion: past, present and future of web search 30
  31. 31. Google Was started in 1996 as the research project of Larry Page and Sergey Brin in Stanford University Was launched in 1998 By the end of 1998 already had an index of about 60 million pages Quickly gained popularity due to PageRank algorithm 31
  32. 32. Google Today Google is the most popular web search engine in the world: 85% of web search market Provides many other services:  Gmail  Google maps  Google+  … Has its own OS – Android Provides web browser – Google Chrome ... 32
  33. 33. Yandex Was founded in 1997 by Arkady Volozh and Ilya Segalovich The first web search engine providing morphological search The prototype of Yandex search engine was a system for autimated searching in Bible The name stand for “Yet Another iNDEXer” 33
  34. 34. Yandex In 1998 Yandex launched contextual advertisement In 2001 Yandex.Direct was launched - an automated, auction-based system for placement of text-based advertising 2005 – Ukraine portal, www.yandex.ua 2008 – Yandex Labs in San Francisco Bay area 2010 – English version of web search engine 2011 - search engine and a range of other services in Turkey, at yandex.com.tr 34
  35. 35. Yandex 35
  36. 36. Yandex today 63% of Russian web search market More than 3500 employees 24 offices in 8 countries 36
  37. 37. Contents✔ Introduction: what do web search engines mean for us today?✔ History of web search engines✔ How web search engines work✔ Most popular search engines➔ Conclusion: past, present and future of web search 37
  38. 38. Conclusion Web search engines are an integral part of our life today They did a long way before they reached todays performance and power Their development is far from being finished Main developing trends are:  Web search personalization  Local-based search  Vertical search 38
  39. 39. Your questions, please 39
  40. 40. Thank you for your time! 40

×