FIT5 Ch. 5, CIS 110 13F

1,355 views
1,230 views

Published on

Ch.5 presentation from Fluency w/Information Technology, 5ed (Pearson)

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,355
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

FIT5 Ch. 5, CIS 110 13F

  1. 1. Chapter 5 Locating Information on the WWW Wednesday, October 16, 13
  2. 2. How a Search Engine Works A. The Web Crawler • software robots (called spiders or bots) => spiders crawl the web to build an index (keywords & web pages) TOKEN URL cat www.cat.com icanhascheezburger.com Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  3. 3. How a Search Engine Works: the Web Crawler • Web crawler: a program that indexes content on the web • Algorithm: – Start from one "seed" page – Extract all links on that page – Follow each link to find new pages – Extract all links from new pages – keep going ... Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  4. 4. Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  5. 5. How a Search Engine Works: B. The Query Processor • user enters search terms (keywords) • query processor looks up word in index • returns hit list • create index in advance • store in RAM, => fast query response Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  6. 6. Multiword Searches: set intersection Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  7. 7. Multiword Searches: set intersection Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  8. 8. Power of Indexed Search • Search engines can look at billions of Web pages and return an answer in less than a fifth of a second Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  9. 9. Data Centers • Search Index is RAM-resident – RAM 100,000x faster than disk – Hennessy/Patterson (4ed) memory access times: » Register: 250ps » L1 Cache: 1ns » RAM: 100ns » Hard Disk 10ms (SSD Flash 100 msec.) => Data Centers: a growth industry in Oregon • Why? Data Centers as Information Substations Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  10. 10. Google’s Data Centers – Google’s facility in The Dalles is only one two dozen, which stretch from Silicon Valley to Dublin. – #servers: 1,000,000 - 2,000,000 • 2 exabytes of hard disk storage – enough to copy the web • “The Indexed Web contains at least 3.59 billion pages (Tuesday, 15 October, 2013).” • 8 petabytes of RAM – Field Trip: Google’s Data Centers Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  11. 11. datacenterknowledge.com • rapid growth in data center electricity use from • • 2000 to 2005 slowed significantly from 2005 to 2010, 2010: total electricity use by all data centers about 1.3% of all electricity use for the world (2% for the US) => Google’s entire global data center network: 220 megawatts Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  12. 12. Data Center Energy Efficiency • PUE (power usage effectiveness) • standard from Green Grid consortium • measures how much power goes directly to computing vs. cooling, lighting, etc. • Score of 1: no power goes to the extra costs • 1.5 means that ancillary services consume half of power used Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  13. 13. Data Center Energy Efficiency • Google PUE: 1.1 => 11% to cooling, etc. • 6 Things You’d Never Guess About Google’s Energy Use • Read more Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  14. 14. What Search Engines Look At – Title— <title> element contains key words – Anchor text— <a> element, describes the page it links to – Landing page— <a> element, the page it connects to – Meta—A <meta> tag in the head section often used for key words – Alt attributes— <img> element attribute gives a textual description – Content— text on the page Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  15. 15. Page Rank Algorithm: Pioneered by Google • PageRank works like a voting system – If page A links to page B, A’s link adds to B’s importance – Pages linked-to by many pages have a high page rank – Links from pages with a high page ranking are ranked as more important Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  16. 16. Field Trip: Basic Search • Google Search Education http://bit.ly/16ZW6Ow Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  17. 17. Advanced Search: Logic Ops • logic operator: AND – human AND powered AND flight hits have at all words • logic operator: OR – marshmallow OR strawberry OR chocolate – OR-queries hits have at least one word • logic opeator: NOT – tigers AND NOT baseball Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  18. 18. Combining Logical Operators (marshmallow OR strawberry) AND sundae • logic operators work like arithmetic • Google also uses a minus (–) as an abbreviation for NOT – http://www.powersearchingwithgoogle.com/ course/ps/assets/ PowerSearchingQuickReference.pdf Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  19. 19. Site Search • Many sites offer the opportunity to perform a site search • (eg) Try this Google search: Google chief economist Hal Varian, site:uoregon.edu Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  20. 20. Field Trip: Power Search • Google Search Education http://www.powersearchingwithgoogle.com/ Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  21. 21. Alternatives to the Search Giant How Wolfram|Alpha Works Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  22. 22. Cloud Storage • • • • • Facebook: 300 petabytes (PB) Microsoft Hotmail: 100 petabytes, Microsoft SkyDrive: 10PB Amazon S3: 900 PB Dropbox: 40PB Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  23. 23. Ch. 5: Assessment Learning Outcomes - Know the following Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13

×