SlideShare a Scribd company logo
1 of 23
Download to read offline
Chapter 5
Locating Information on the WWW

Wednesday, October 16, 13
How a Search Engine Works
A. The Web Crawler
•

software robots (called spiders or bots)
=> spiders crawl the web to build an index
(keywords & web pages)
TOKEN

URL

cat

www.cat.com
icanhascheezburger.com

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
How a Search Engine Works:
the Web Crawler
• Web crawler: a program that indexes
content on the web
• Algorithm:
– Start from one "seed" page
– Extract all links on that page
– Follow each link to find new pages
– Extract all links from new pages
– keep going ...
Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
How a Search Engine Works:
B. The Query Processor
• user enters search terms (keywords)
• query processor looks up word in index
• returns hit list
• create index in advance
• store in RAM,
=> fast query response
Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Multiword Searches:
set intersection

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Multiword Searches:
set intersection

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Power of Indexed Search
• Search engines can look at billions of Web
pages and return an answer in less than a
fifth of a second

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Data Centers
• Search Index is RAM-resident
– RAM 100,000x faster than disk
– Hennessy/Patterson (4ed) memory access times:
» Register: 250ps
» L1 Cache: 1ns
» RAM: 100ns
» Hard Disk 10ms (SSD Flash 100 msec.)

=> Data Centers: a growth industry in
Oregon
• Why?
Data Centers as Information Substations
Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Google’s Data Centers
– Google’s facility in The Dalles is only one two
dozen, which stretch from Silicon Valley to
Dublin.
– #servers: 1,000,000 - 2,000,000
• 2 exabytes of hard disk storage – enough to copy
the web
• “The Indexed Web contains at least 3.59 billion
pages (Tuesday, 15 October, 2013).”
• 8 petabytes of RAM

– Field Trip: Google’s Data Centers
Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
datacenterknowledge.com
• rapid growth in data center electricity use from
•
•

2000 to 2005
slowed significantly from 2005 to 2010,
2010: total electricity use by all data centers
about 1.3% of all electricity use for the world
(2% for the US)

=> Google’s entire global data center network:
220 megawatts

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Data Center Energy Efficiency
• PUE (power usage effectiveness)
• standard from Green Grid consortium
• measures how much power goes directly to
computing vs. cooling, lighting, etc.

• Score of 1: no power goes to the extra costs
• 1.5 means that ancillary services
consume half of power used

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Data Center Energy Efficiency
• Google PUE: 1.1
=> 11% to cooling, etc.

• 6 Things You’d Never Guess About Google’s
Energy Use

• Read more

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
What Search Engines Look At
– Title— <title> element contains key words
– Anchor text— <a> element, describes the
page it links to
– Landing page— <a> element, the page it
connects to
– Meta—A <meta> tag in the head section often
used for key words
– Alt attributes— <img> element attribute gives
a textual description
– Content— text on the page
Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Page Rank Algorithm:
Pioneered by Google
• PageRank works like a voting system
– If page A links to page B, A’s link adds to B’s
importance
– Pages linked-to by many pages have a high
page rank
– Links from pages with a high page ranking are
ranked as more important

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Field Trip: Basic Search
• Google Search Education
http://bit.ly/16ZW6Ow

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Advanced Search: Logic Ops
• logic operator: AND
– human AND powered AND flight
hits have at all words

• logic operator: OR
– marshmallow OR strawberry OR chocolate
– OR-queries hits have at least one word

• logic opeator: NOT
– tigers AND NOT baseball
Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Combining Logical Operators
(marshmallow OR strawberry) AND sundae

• logic operators work like arithmetic
• Google also uses a minus (–) as an
abbreviation for NOT
– http://www.powersearchingwithgoogle.com/
course/ps/assets/
PowerSearchingQuickReference.pdf
Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Site Search
• Many sites offer the opportunity to perform
a site search
• (eg) Try this Google search:
Google chief economist Hal Varian,
site:uoregon.edu

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Field Trip: Power Search
• Google Search Education
http://www.powersearchingwithgoogle.com/

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Alternatives to the Search Giant

How Wolfram|Alpha Works

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Cloud Storage
•
•
•
•
•

Facebook: 300 petabytes (PB)
Microsoft Hotmail: 100 petabytes,
Microsoft SkyDrive: 10PB
Amazon S3: 900 PB
Dropbox: 40PB

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13
Ch. 5: Assessment
Learning Outcomes - Know the following

Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Wednesday, October 16, 13

More Related Content

What's hot

Effective Googling
Effective GooglingEffective Googling
Effective Googlingguest526b5c
 
Effective Googling
Effective GooglingEffective Googling
Effective GooglingAnuradha
 
Effective Googling
Effective GooglingEffective Googling
Effective Googlingmoiz_aziz
 
Academic Research on the Internet is New Library in Rural America
Academic Research on the Internet is New Library in Rural AmericaAcademic Research on the Internet is New Library in Rural America
Academic Research on the Internet is New Library in Rural Americawumoye
 

What's hot (6)

Effective Googling
Effective GooglingEffective Googling
Effective Googling
 
Effective Googling
Effective GooglingEffective Googling
Effective Googling
 
Effective Googling
Effective GooglingEffective Googling
Effective Googling
 
Effective Googling
Effective GooglingEffective Googling
Effective Googling
 
Effective Googling!
Effective Googling!Effective Googling!
Effective Googling!
 
Academic Research on the Internet is New Library in Rural America
Academic Research on the Internet is New Library in Rural AmericaAcademic Research on the Internet is New Library in Rural America
Academic Research on the Internet is New Library in Rural America
 

Viewers also liked

Top 3 Video Styles to Use For Student Recruitment Marketing
Top 3 Video Styles to Use For Student Recruitment MarketingTop 3 Video Styles to Use For Student Recruitment Marketing
Top 3 Video Styles to Use For Student Recruitment MarketingHuStream Video
 
NCC ART104 1
NCC ART104 1NCC ART104 1
NCC ART104 165swiss
 
スマートウォッチってどうなん
スマートウォッチってどうなんスマートウォッチってどうなん
スマートウォッチってどうなん三菱 うにたん
 
Presentación
PresentaciónPresentación
Presentación880808
 
Ncc art100 ch.5
Ncc art100 ch.5Ncc art100 ch.5
Ncc art100 ch.565swiss
 
Ch. 16 Database Case Study: XML/XSLT
Ch. 16 Database Case Study: XML/XSLTCh. 16 Database Case Study: XML/XSLT
Ch. 16 Database Case Study: XML/XSLTmh-108
 
Ch. 3 FIT5, CIS 110 13F
Ch. 3 FIT5, CIS 110 13FCh. 3 FIT5, CIS 110 13F
Ch. 3 FIT5, CIS 110 13Fmh-108
 
Clay Robertson CV 2015
Clay Robertson CV 2015Clay Robertson CV 2015
Clay Robertson CV 2015Clay Robertson
 
Ch. 10 FIT5, CIS 110 13F
Ch. 10 FIT5, CIS 110 13FCh. 10 FIT5, CIS 110 13F
Ch. 10 FIT5, CIS 110 13Fmh-108
 
20130909 sacloudの薄い本
20130909 sacloudの薄い本20130909 sacloudの薄い本
20130909 sacloudの薄い本Yasuyuki SAITO
 

Viewers also liked (17)

Carolina bedoya bedoya
Carolina bedoya bedoyaCarolina bedoya bedoya
Carolina bedoya bedoya
 
Top 3 Video Styles to Use For Student Recruitment Marketing
Top 3 Video Styles to Use For Student Recruitment MarketingTop 3 Video Styles to Use For Student Recruitment Marketing
Top 3 Video Styles to Use For Student Recruitment Marketing
 
NCC ART104 1
NCC ART104 1NCC ART104 1
NCC ART104 1
 
Tics
TicsTics
Tics
 
スマートウォッチってどうなん
スマートウォッチってどうなんスマートウォッチってどうなん
スマートウォッチってどうなん
 
Presentación
PresentaciónPresentación
Presentación
 
Prelim photos
Prelim photosPrelim photos
Prelim photos
 
Shan bhai
Shan bhaiShan bhai
Shan bhai
 
Civil Rights: Historical View
Civil Rights: Historical ViewCivil Rights: Historical View
Civil Rights: Historical View
 
Ncc art100 ch.5
Ncc art100 ch.5Ncc art100 ch.5
Ncc art100 ch.5
 
Ms word
Ms wordMs word
Ms word
 
Ch. 16 Database Case Study: XML/XSLT
Ch. 16 Database Case Study: XML/XSLTCh. 16 Database Case Study: XML/XSLT
Ch. 16 Database Case Study: XML/XSLT
 
Ch. 3 FIT5, CIS 110 13F
Ch. 3 FIT5, CIS 110 13FCh. 3 FIT5, CIS 110 13F
Ch. 3 FIT5, CIS 110 13F
 
Clay Robertson CV 2015
Clay Robertson CV 2015Clay Robertson CV 2015
Clay Robertson CV 2015
 
Ch. 10 FIT5, CIS 110 13F
Ch. 10 FIT5, CIS 110 13FCh. 10 FIT5, CIS 110 13F
Ch. 10 FIT5, CIS 110 13F
 
20130909 sacloudの薄い本
20130909 sacloudの薄い本20130909 sacloudの薄い本
20130909 sacloudの薄い本
 
Pd sir
Pd sirPd sir
Pd sir
 

Similar to FIT5 Ch. 5, CIS 110 13F

Ch. 15 FIT5, CIS 110 13F
Ch. 15 FIT5, CIS 110 13FCh. 15 FIT5, CIS 110 13F
Ch. 15 FIT5, CIS 110 13Fmh-108
 
AppEngine Performance Tuning
AppEngine Performance TuningAppEngine Performance Tuning
AppEngine Performance TuningDavid Chen
 
Basics of search engines and algorithms (1)
Basics of search engines and algorithms (1)Basics of search engines and algorithms (1)
Basics of search engines and algorithms (1)kongara
 
Basics of Search Engines and Algorithms
Basics of Search Engines and AlgorithmsBasics of Search Engines and Algorithms
Basics of Search Engines and AlgorithmsWeb Trainings Academy
 
MLRG 01/18/13
MLRG 01/18/13MLRG 01/18/13
MLRG 01/18/13Aaron
 
Google Search Engine
Google Search EngineGoogle Search Engine
Google Search Engineguestf460ed0
 
E-Learning System Design: Teacher-Student Websites
E-Learning System Design: Teacher-Student WebsitesE-Learning System Design: Teacher-Student Websites
E-Learning System Design: Teacher-Student WebsitesKarwan Jacksi
 
T L W Smart Searching
T L W Smart SearchingT L W Smart Searching
T L W Smart SearchingPam Krambeck
 
The Google Hacking Database: A Key Resource to Exposing Vulnerabilities
The Google Hacking Database: A Key Resource to Exposing VulnerabilitiesThe Google Hacking Database: A Key Resource to Exposing Vulnerabilities
The Google Hacking Database: A Key Resource to Exposing VulnerabilitiesTechWell
 
Google hummingbird algorithm ppt
Google hummingbird algorithm pptGoogle hummingbird algorithm ppt
Google hummingbird algorithm pptPriyodarshini Dhar
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationLorri Mon
 

Similar to FIT5 Ch. 5, CIS 110 13F (20)

Chapter05
Chapter05Chapter05
Chapter05
 
Ch. 15 FIT5, CIS 110 13F
Ch. 15 FIT5, CIS 110 13FCh. 15 FIT5, CIS 110 13F
Ch. 15 FIT5, CIS 110 13F
 
Search engines
Search enginesSearch engines
Search engines
 
AppEngine Performance Tuning
AppEngine Performance TuningAppEngine Performance Tuning
AppEngine Performance Tuning
 
Basics of search engines and algorithms (1)
Basics of search engines and algorithms (1)Basics of search engines and algorithms (1)
Basics of search engines and algorithms (1)
 
Basics of Search Engines and Algorithms
Basics of Search Engines and AlgorithmsBasics of Search Engines and Algorithms
Basics of Search Engines and Algorithms
 
MLRG 01/18/13
MLRG 01/18/13MLRG 01/18/13
MLRG 01/18/13
 
Effective googloing
Effective googloingEffective googloing
Effective googloing
 
Google
GoogleGoogle
Google
 
Google Search Engine
Google Search EngineGoogle Search Engine
Google Search Engine
 
Google Search Engine
Google Search EngineGoogle Search Engine
Google Search Engine
 
E-Learning System Design: Teacher-Student Websites
E-Learning System Design: Teacher-Student WebsitesE-Learning System Design: Teacher-Student Websites
E-Learning System Design: Teacher-Student Websites
 
T L W Smart Searching
T L W Smart SearchingT L W Smart Searching
T L W Smart Searching
 
The Google Hacking Database: A Key Resource to Exposing Vulnerabilities
The Google Hacking Database: A Key Resource to Exposing VulnerabilitiesThe Google Hacking Database: A Key Resource to Exposing Vulnerabilities
The Google Hacking Database: A Key Resource to Exposing Vulnerabilities
 
Modern web search: Web Information Systems
Modern web search: Web Information SystemsModern web search: Web Information Systems
Modern web search: Web Information Systems
 
Modern web search: Lecture 11
Modern web search: Lecture 11Modern web search: Lecture 11
Modern web search: Lecture 11
 
Google Hummingbird
Google HummingbirdGoogle Hummingbird
Google Hummingbird
 
Google Dorks
Google DorksGoogle Dorks
Google Dorks
 
Google hummingbird algorithm ppt
Google hummingbird algorithm pptGoogle hummingbird algorithm ppt
Google hummingbird algorithm ppt
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
 

More from mh-108

Ch. 17 FIT5, CIS 110 13F
Ch. 17 FIT5, CIS 110 13FCh. 17 FIT5, CIS 110 13F
Ch. 17 FIT5, CIS 110 13Fmh-108
 
Ch. 4 FIT5, CIS 110 13F
Ch. 4 FIT5, CIS 110 13FCh. 4 FIT5, CIS 110 13F
Ch. 4 FIT5, CIS 110 13Fmh-108
 
Ch. 12 FIT5, CIS 110 13F
Ch. 12 FIT5, CIS 110 13FCh. 12 FIT5, CIS 110 13F
Ch. 12 FIT5, CIS 110 13Fmh-108
 
Ch. 8 FIT5, CIS 110 13F
Ch. 8 FIT5, CIS 110 13FCh. 8 FIT5, CIS 110 13F
Ch. 8 FIT5, CIS 110 13Fmh-108
 
Ch. 7 FIT5, CIS 110 13F
Ch. 7 FIT5, CIS 110 13FCh. 7 FIT5, CIS 110 13F
Ch. 7 FIT5, CIS 110 13Fmh-108
 
Ch. 3 HTML5, CIS 110 13F
Ch. 3 HTML5, CIS 110 13FCh. 3 HTML5, CIS 110 13F
Ch. 3 HTML5, CIS 110 13Fmh-108
 
Ch. 2 HTML5, CIS 110 13F
Ch. 2 HTML5, CIS 110 13FCh. 2 HTML5, CIS 110 13F
Ch. 2 HTML5, CIS 110 13Fmh-108
 
Ch. 1 HTML5, CIS 110 13F
Ch. 1 HTML5, CIS 110 13FCh. 1 HTML5, CIS 110 13F
Ch. 1 HTML5, CIS 110 13Fmh-108
 

More from mh-108 (8)

Ch. 17 FIT5, CIS 110 13F
Ch. 17 FIT5, CIS 110 13FCh. 17 FIT5, CIS 110 13F
Ch. 17 FIT5, CIS 110 13F
 
Ch. 4 FIT5, CIS 110 13F
Ch. 4 FIT5, CIS 110 13FCh. 4 FIT5, CIS 110 13F
Ch. 4 FIT5, CIS 110 13F
 
Ch. 12 FIT5, CIS 110 13F
Ch. 12 FIT5, CIS 110 13FCh. 12 FIT5, CIS 110 13F
Ch. 12 FIT5, CIS 110 13F
 
Ch. 8 FIT5, CIS 110 13F
Ch. 8 FIT5, CIS 110 13FCh. 8 FIT5, CIS 110 13F
Ch. 8 FIT5, CIS 110 13F
 
Ch. 7 FIT5, CIS 110 13F
Ch. 7 FIT5, CIS 110 13FCh. 7 FIT5, CIS 110 13F
Ch. 7 FIT5, CIS 110 13F
 
Ch. 3 HTML5, CIS 110 13F
Ch. 3 HTML5, CIS 110 13FCh. 3 HTML5, CIS 110 13F
Ch. 3 HTML5, CIS 110 13F
 
Ch. 2 HTML5, CIS 110 13F
Ch. 2 HTML5, CIS 110 13FCh. 2 HTML5, CIS 110 13F
Ch. 2 HTML5, CIS 110 13F
 
Ch. 1 HTML5, CIS 110 13F
Ch. 1 HTML5, CIS 110 13FCh. 1 HTML5, CIS 110 13F
Ch. 1 HTML5, CIS 110 13F
 

Recently uploaded

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 

Recently uploaded (20)

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 

FIT5 Ch. 5, CIS 110 13F

  • 1. Chapter 5 Locating Information on the WWW Wednesday, October 16, 13
  • 2. How a Search Engine Works A. The Web Crawler • software robots (called spiders or bots) => spiders crawl the web to build an index (keywords & web pages) TOKEN URL cat www.cat.com icanhascheezburger.com Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 3. How a Search Engine Works: the Web Crawler • Web crawler: a program that indexes content on the web • Algorithm: – Start from one "seed" page – Extract all links on that page – Follow each link to find new pages – Extract all links from new pages – keep going ... Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 4. Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 5. How a Search Engine Works: B. The Query Processor • user enters search terms (keywords) • query processor looks up word in index • returns hit list • create index in advance • store in RAM, => fast query response Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 6. Multiword Searches: set intersection Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 7. Multiword Searches: set intersection Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 8. Power of Indexed Search • Search engines can look at billions of Web pages and return an answer in less than a fifth of a second Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 9. Data Centers • Search Index is RAM-resident – RAM 100,000x faster than disk – Hennessy/Patterson (4ed) memory access times: » Register: 250ps » L1 Cache: 1ns » RAM: 100ns » Hard Disk 10ms (SSD Flash 100 msec.) => Data Centers: a growth industry in Oregon • Why? Data Centers as Information Substations Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 10. Google’s Data Centers – Google’s facility in The Dalles is only one two dozen, which stretch from Silicon Valley to Dublin. – #servers: 1,000,000 - 2,000,000 • 2 exabytes of hard disk storage – enough to copy the web • “The Indexed Web contains at least 3.59 billion pages (Tuesday, 15 October, 2013).” • 8 petabytes of RAM – Field Trip: Google’s Data Centers Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 11. datacenterknowledge.com • rapid growth in data center electricity use from • • 2000 to 2005 slowed significantly from 2005 to 2010, 2010: total electricity use by all data centers about 1.3% of all electricity use for the world (2% for the US) => Google’s entire global data center network: 220 megawatts Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 12. Data Center Energy Efficiency • PUE (power usage effectiveness) • standard from Green Grid consortium • measures how much power goes directly to computing vs. cooling, lighting, etc. • Score of 1: no power goes to the extra costs • 1.5 means that ancillary services consume half of power used Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 13. Data Center Energy Efficiency • Google PUE: 1.1 => 11% to cooling, etc. • 6 Things You’d Never Guess About Google’s Energy Use • Read more Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 14. What Search Engines Look At – Title— <title> element contains key words – Anchor text— <a> element, describes the page it links to – Landing page— <a> element, the page it connects to – Meta—A <meta> tag in the head section often used for key words – Alt attributes— <img> element attribute gives a textual description – Content— text on the page Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 15. Page Rank Algorithm: Pioneered by Google • PageRank works like a voting system – If page A links to page B, A’s link adds to B’s importance – Pages linked-to by many pages have a high page rank – Links from pages with a high page ranking are ranked as more important Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 16. Field Trip: Basic Search • Google Search Education http://bit.ly/16ZW6Ow Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 17. Advanced Search: Logic Ops • logic operator: AND – human AND powered AND flight hits have at all words • logic operator: OR – marshmallow OR strawberry OR chocolate – OR-queries hits have at least one word • logic opeator: NOT – tigers AND NOT baseball Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 18. Combining Logical Operators (marshmallow OR strawberry) AND sundae • logic operators work like arithmetic • Google also uses a minus (–) as an abbreviation for NOT – http://www.powersearchingwithgoogle.com/ course/ps/assets/ PowerSearchingQuickReference.pdf Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 19. Site Search • Many sites offer the opportunity to perform a site search • (eg) Try this Google search: Google chief economist Hal Varian, site:uoregon.edu Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 20. Field Trip: Power Search • Google Search Education http://www.powersearchingwithgoogle.com/ Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 21. Alternatives to the Search Giant How Wolfram|Alpha Works Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 22. Cloud Storage • • • • • Facebook: 300 petabytes (PB) Microsoft Hotmail: 100 petabytes, Microsoft SkyDrive: 10PB Amazon S3: 900 PB Dropbox: 40PB Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13
  • 23. Ch. 5: Assessment Learning Outcomes - Know the following Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Wednesday, October 16, 13