FIT5 Ch. 5, CIS 110 13F
Upcoming SlideShare
Loading in...5
×
 

FIT5 Ch. 5, CIS 110 13F

on

  • 462 views

Ch.5 presentation from Fluency w/Information Technology, 5ed (Pearson)

Ch.5 presentation from Fluency w/Information Technology, 5ed (Pearson)

Statistics

Views

Total Views
462
Views on SlideShare
462
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

FIT5 Ch. 5, CIS 110 13F FIT5 Ch. 5, CIS 110 13F Presentation Transcript

  • Chapter 5 Locating Information on the WWW Wednesday, October 16, 13
  • How a Search Engine Works A. The Web Crawler • software robots (called spiders or bots) => spiders crawl the web to build an index (keywords & web pages) TOKEN URL cat www.cat.com icanhascheezburger.com Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • How a Search Engine Works: the Web Crawler • Web crawler: a program that indexes content on the web • Algorithm: – Start from one "seed" page – Extract all links on that page – Follow each link to find new pages – Extract all links from new pages – keep going ... Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • How a Search Engine Works: B. The Query Processor • user enters search terms (keywords) • query processor looks up word in index • returns hit list • create index in advance • store in RAM, => fast query response Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Multiword Searches: set intersection Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Multiword Searches: set intersection Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Power of Indexed Search • Search engines can look at billions of Web pages and return an answer in less than a fifth of a second Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Data Centers • Search Index is RAM-resident – RAM 100,000x faster than disk – Hennessy/Patterson (4ed) memory access times: » Register: 250ps » L1 Cache: 1ns » RAM: 100ns » Hard Disk 10ms (SSD Flash 100 msec.) => Data Centers: a growth industry in Oregon • Why? Data Centers as Information Substations Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Google’s Data Centers – Google’s facility in The Dalles is only one two dozen, which stretch from Silicon Valley to Dublin. – #servers: 1,000,000 - 2,000,000 • 2 exabytes of hard disk storage – enough to copy the web • “The Indexed Web contains at least 3.59 billion pages (Tuesday, 15 October, 2013).” • 8 petabytes of RAM – Field Trip: Google’s Data Centers Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • datacenterknowledge.com • rapid growth in data center electricity use from • • 2000 to 2005 slowed significantly from 2005 to 2010, 2010: total electricity use by all data centers about 1.3% of all electricity use for the world (2% for the US) => Google’s entire global data center network: 220 megawatts Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Data Center Energy Efficiency • PUE (power usage effectiveness) • standard from Green Grid consortium • measures how much power goes directly to computing vs. cooling, lighting, etc. • Score of 1: no power goes to the extra costs • 1.5 means that ancillary services consume half of power used Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Data Center Energy Efficiency • Google PUE: 1.1 => 11% to cooling, etc. • 6 Things You’d Never Guess About Google’s Energy Use • Read more Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • What Search Engines Look At – Title— <title> element contains key words – Anchor text— <a> element, describes the page it links to – Landing page— <a> element, the page it connects to – Meta—A <meta> tag in the head section often used for key words – Alt attributes— <img> element attribute gives a textual description – Content— text on the page Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Page Rank Algorithm: Pioneered by Google • PageRank works like a voting system – If page A links to page B, A’s link adds to B’s importance – Pages linked-to by many pages have a high page rank – Links from pages with a high page ranking are ranked as more important Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Field Trip: Basic Search • Google Search Education http://bit.ly/16ZW6Ow Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Advanced Search: Logic Ops • logic operator: AND – human AND powered AND flight hits have at all words • logic operator: OR – marshmallow OR strawberry OR chocolate – OR-queries hits have at least one word • logic opeator: NOT – tigers AND NOT baseball Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Combining Logical Operators (marshmallow OR strawberry) AND sundae • logic operators work like arithmetic • Google also uses a minus (–) as an abbreviation for NOT – http://www.powersearchingwithgoogle.com/ course/ps/assets/ PowerSearchingQuickReference.pdf Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Site Search • Many sites offer the opportunity to perform a site search • (eg) Try this Google search: Google chief economist Hal Varian, site:uoregon.edu Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Field Trip: Power Search • Google Search Education http://www.powersearchingwithgoogle.com/ Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Alternatives to the Search Giant How Wolfram|Alpha Works Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Cloud Storage • • • • • Facebook: 300 petabytes (PB) Microsoft Hotmail: 100 petabytes, Microsoft SkyDrive: 10PB Amazon S3: 900 PB Dropbox: 40PB Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • Ch. 5: Assessment Learning Outcomes - Know the following Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13