Successfully reported this slideshow.
Your SlideShare is downloading. ×

Tolmachev Alexander Web Search Engines

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Tolmachev Alexander Web Search Engines

  1. 1. Web search engines Alexander Tolmachev gr. #3057/2
  2. 2. Contents  Introduction: what do web search engines mean for us today?  History of web search engines  How web search engines work  Most popular search engines  Conclusion: past, present and future of web search 2
  3. 3. Contents ➔ Introduction: what do web search engines mean for us today?  History of web search engines  How web search engines work  Most popular search engines  Conclusion: past, present and future of web search 3
  4. 4. The Web as a huge storage of information  A huge amount of information is contained in the Word Wide Web  And this amount is still growing day by day  We need to orient ourself in this enormous information space  Web search engines provide us fast search of information that we are interested in 4
  5. 5. Web search engines in our life  We use web search engines every day for:  Searching texts, articles, books, news, etc.  Searching different media: music, videos, films, pictures, etc.  Searching goods  Searching web sites and web portals  Preparing lectures and presentations ☺  …  The verb “to google” is included in dictionaries  Web search engines have become an integral part of our life 5
  6. 6. Contents ✔ Introduction: what do web search engines mean for us today? ➔ History of web search engines  How web search engines work  Most popular search engines  Conclusion: past, present and future of web search 6
  7. 7. The very first search tools  1989–1991 – the invention of the World Wide Web by Sir Tim Berners-Lee in CERN  Archie (1990)  The first Internet search tool  Fetching and indexing files on FTP servers  Providing search for indexed files  Veronica and Jughead – similar to Archie search tools for Gopher protocol invented in 1991 7
  8. 8. The first web search engines  W3Catalog (1993)  The first primitive search engine  Mirroring and integration of manually maintained catalogues  Still available: http://www.w3catalog.com/  World Wide Web Wanderer (1993)  The first web crawler  The first web index called Wandex  Aimed to count Web size, not to serve as a search tool 8
  9. 9. The first web search engines  JumpStation (1993)  The first web search engine combining crawling, indexing and searching  A web form for search queries  No ranking, just listing search results  Excite (1994)  The first ranking system  WebCrawler (1994)  Indexing full text  The first widely known web search engine 9
  10. 10. Web search evolution  1994–1997 – a number of similar web search engines:  Infoseek  OpenText  Magellan  Inktomi  Northern Light  AskJeeves  AltaVista 10
  11. 11. Web search evolution  Yahoo! (1994)  Search in human edited hierarchical web directory  Manual solution of relevancy  Search by keywords as well as browsing full directory  Gained large popularity  Later in 2004 developed its own web search engine  One of the main stars in business world in 1990s 11
  12. 12. Web search evolution  Google (1998)  The invention of Page Rank  Simple and clear interface instead of turning to a web portal  Yandex (1997)  Full-text search with Russian morphology support  Quickly gained large popularity in Russia 12
  13. 13. Web search engines today  Powerful web search technologies  Maximal freshness of results  Variety of types of searchable documents  Intelligent algorithms of ranking  Media search:  Images  Music  Videos  … 13
  14. 14. Web search engines today  Personalized search  Based on user's search history  Based on personal information from virtual social spaces  Location-based search  Vertical search  Image-based search  Audio-based search 14
  15. 15. Contents ✔ Introduction: what do web search engines mean for us today? ✔ History of web search engines ➔ How web search engines work  Most popular search engines  Conclusion: past, present and future of web search 15
  16. 16. Basic principles of web search  Create and sort a pool of data  Find the most appropriate information  Deliver this information 16
  17. 17. Basic parts of web search engine  A web spider/crawler/robot – a computer program which:  Continuously traverses web pages  Finds new or changed content  Stores visited pages in corpus  Index – a database containing crawling results  Search engine – a computer program which:  Identifies pages relevant to search query  Retrieve this pages  Rank them  User interface 17
  18. 18. Web crawling  Web crawling is aimed to traverse web pages and to store their copies for further indexing  General web crawler algorithm:  Starts with a list of initial URLs, called the seeds  Visits these URLs  Retrieves required information from the page  Identifies all the hyper-links on the page  Adds this links to the queue of URLs, called the crawl frontier  Recursively visit URLs from the crawl frontier 18
  19. 19. Web crawler architecture 19
  20. 20. Crawling policies  A selection policy  Focused crawling  Restricting followed links  URL normalization  Path-ascending crawling  A re-visit policy  Uniform policy  Proportional policy  A politeness policy  A parallelization policy 20
  21. 21. Indexing  Indexing is purposed to provide high speed and performance in finding relevant documents in corpus for a search query.  For example 10,000 documents:  Queried within milliseconds with the help of index  Sequential scan could take hours  Meta search engines reuse the indices of other services and do not store a local index  E.g. vertical search can use indices of vertical services 21
  22. 22. Inverted index  For each word stores a list of documents containing this word  Provides direct access to the documents associated with each word in the search query  Commonly used by web search engines  Not convenient to update 22
  23. 23. Forward index  Stores a list of words for each document  It's more handy to store words per document immediately during its parsing  Enables asynchronous processing – mush easy to update then inverted index  Is stored to be transformed to inverted index 23
  24. 24. Ranking  Ranking is an arrangement of web search results in order of relevance  Usually based on statistical methods  Frequency of keywords in particulat document  Rating page popularity and authority  Advanced search engines also use intelligent algorithms of ranking 24
  25. 25. Google PageRank  PageRank was invented in 1998 by Larry Page and Sergey Brin at Stanford University  It is aimed to rate web page authority relatively to other web pages  Basic principles:  A hyperlink to a page counts as a vote of support  Page with high number of incoming links has high authority  A hyperlink coming from authoritative web page gives more points  PR(p) is a probability that a person randomly clicking on links will arrive at page p 25
  26. 26. Google PageRank A B C D 0.25 0.25 0.25 0.25 A B C D 1/2 1/6 1/6 1/6 A B C D 6/17 2/17 3/17 6/17 26
  27. 27. Google PageRank  So, PageRank of page A:  In the general case, the PageRank value for any page u: where Bu – set containing all pages linking to page u; L(v) – number of links from page v. 27
  28. 28. Google PageRank  Spider traps: A B C  Damp factor  d – probability that random surfer continue traversal  (1-d) – probability of going to random site  The result formula: 28
  29. 29. Web Search Engine Architecture 29
  30. 30. Contents ✔ Introduction: what do web search engines mean for us today? ✔ History of web search engines ✔ How web search engines work ➔ Most popular search engines  Conclusion: past, present and future of web search 30
  31. 31. Google  Was started in 1996 as the research project of Larry Page and Sergey Brin in Stanford University  Was launched in 1998  By the end of 1998 already had an index of about 60 million pages  Quickly gained popularity due to PageRank algorithm 31
  32. 32. Google  Today Google is the most popular web search engine in the world: 85% of web search market  Provides many other services:  Gmail  Google maps  Google+  …  Has its own OS – Android  Provides web browser – Google Chrome  ... 32
  33. 33. Yandex  Was founded in 1997 by Arkady Volozh and Ilya Segalovich  The first web search engine providing morphological search  The prototype of Yandex search engine was a system for autimated searching in Bible  The name stand for “Yet Another iNDEXer” 33
  34. 34. Yandex  In 1998 Yandex launched  contextual advertisement  In 2001 Yandex.Direct was launched - an automated, auction-based system for placement of text-based advertising  2005 – Ukraine portal, www.yandex.ua  2008 – Yandex Labs in San Francisco Bay area  2010 – English version of web search engine  2011 - search engine and a range of other services in Turkey, at yandex.com.tr 34
  35. 35. Yandex 35
  36. 36. Yandex today  63% of Russian web search market  More than 3500 employees  24 offices in 8 countries 36
  37. 37. Contents ✔ Introduction: what do web search engines mean for us today? ✔ History of web search engines ✔ How web search engines work ✔ Most popular search engines ➔ Conclusion: past, present and future of web search 37
  38. 38. Conclusion  Web search engines are an integral part of our life today  They did a long way before they reached today's performance and power  Their development is far from being finished  Main developing trends are:  Web search personalization  Local-based search  Vertical search 38
  39. 39. Your questions, please 39
  40. 40. Thank you for your time! 40

×