14 October 2013

Surfacing the Web
Websearch Academy 2013

WebSearch Academy
Internet Librarian International

Surfacing t...
14 October 2013

Surfacing the Web
Websearch Academy 2013

5 Types of “Invisibility”

Not search
engine
optimised
so pages...
14 October 2013

Surfacing the Web
Websearch Academy 2013

What do I need to find?

What sort of needle? What sort of hays...
14 October 2013

Surfacing the Web
Websearch Academy 2013

Not everything is online or can be found!
•  Try to find:
  Or...
14 October 2013

Surfacing the Web
Websearch Academy 2013

Before starting to search consider
sources for the subject / to...
14 October 2013

Surfacing the Web
Websearch Academy 2013

Research Planning

Information
Requirements

© AWARE 2013

Brea...
14 October 2013

Surfacing the Web
Websearch Academy 2013

Types of “Invisibility”

Not search
engine
optimised
so pages f...
14 October 2013

Surfacing the Web
Websearch Academy 2013

Search Engines – not just Google

© AWARE 2013

Tel: +44 20 895...
14 October 2013

Surfacing the Web
Websearch Academy 2013

Specialist Search / Deep Web Search

© AWARE 2013

Tel: +44 20 ...
14 October 2013

Surfacing the Web
Websearch Academy 2013

Case Examples – Economics by Country

© AWARE 2013

Tel: +44 20...
14 October 2013

Surfacing the Web
Websearch Academy 2013

Case Examples – Economic Indicators

© AWARE 2013

Tel: +44 20 ...
14 October 2013

Surfacing the Web
Websearch Academy 2013

Types of “Invisibility”

Not search
engine
optimised
so pages f...
14 October 2013

Surfacing the Web
Websearch Academy 2013

Types of “Invisibility”

Not search
engine
optimised
so pages f...
14 October 2013

Surfacing the Web
Websearch Academy 2013

Searching TOR
•  On regular Google: fake passport site:onion.to...
14 October 2013

Surfacing the Web
Websearch Academy 2013

Any Questions?

Arthur Weiss is the managing director of AWARE ...
Upcoming SlideShare
Loading in …5
×

Surfacing the deep web (2 slides per page)

1,027 views
975 views

Published on

Presentation given at Internet Librarian International 2013 - Websearch Academy. 14 October 2013 on finding deep-web / invisible web content

Published in: Business, Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,027
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Surfacing the deep web (2 slides per page)

  1. 1. 14 October 2013 Surfacing the Web Websearch Academy 2013 WebSearch Academy Internet Librarian International Surfacing the Deep Web Arthur Weiss Email: a.weiss@aware.co.uk / Twitter: @awareci www.marketing-intelligence.co.uk 14 October 2013 © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk Not everything can be found with Google…. The ‘Invisible Web’ or ‘Deep Web’ consists of web pages and documents which are not indexed by conventional search engines or are poorly or incompletely indexed. © AWARE 2013 © Arthur Weiss, AWARE, 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 1
  2. 2. 14 October 2013 Surfacing the Web Websearch Academy 2013 5 Types of “Invisibility” Not search engine optimised so pages fail to appear in “simple” searches © AWARE 2013 Not indexed by search engines Excluded from search index Subscription or proprietary content Encrypted or nonindexable content Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 3 Know your tool kit or Standard Google © AWARE 2013 © Arthur Weiss, AWARE, 2013 Multiple approaches & tools Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 4 2
  3. 3. 14 October 2013 Surfacing the Web Websearch Academy 2013 What do I need to find? What sort of needle? What sort of haystack? http://www.morguefile.com/archive/display/21091 © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 5 Why will the information be available? Where will it be held (Who will know it?) Can I obtain it legally and ethically from this source & if so, how? If not, are there other sources or ways of obtaining the information? After obtaining the information are any checks needed to verify it? What is the information’s relationship to other information? © AWARE 2013 © Arthur Weiss, AWARE, 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 6 3
  4. 4. 14 October 2013 Surfacing the Web Websearch Academy 2013 Not everything is online or can be found! •  Try to find:   Original TV coverage of the storming of the Bastille1   A newspaper interview with Christopher Columbus, following his return from discovering America   A recording of Abraham Lincoln delivering the Gettysburg address   A photo of Jesus in his crib (Question from a 9 year old: “Why didn’t anybody take photos with their phones?”) 1 With thanks to Karen Blakeman of RBA Information (rba.co.uk) for these examples © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk “Forty-two! Is that all you’ve got to show for seven and a half million year’s work?” “I checked it very thoroughly and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you’ve never actually known what the question is.” Douglas Adams, “The Hitchhiker’s Guide to the Galaxy” If your search approach is wrong, it doesn’t matter which approach or tool you use, or how you use it. Your results will be poor or wrong. © AWARE 2013 © Arthur Weiss, AWARE, 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 4
  5. 5. 14 October 2013 Surfacing the Web Websearch Academy 2013 Before starting to search consider sources for the subject / topic of interest… Why is information likely to be available? Consider also file-formats, and location of search terms What search tool / approach is most likely to access or index the information’s location (container) Are there unique terms or jargon that lead to a specialist tool e.g. Lung cancer (consumer) versus pulmonary carcinoma (medical) Are there societies, organisations, people, or groups that may have information? (Who/where else could have information?) Would any of the relevant pages be in another language? “cheap hotel in Dubai” OR “‫”ﻓﻨﺪق اﻗﺘﺼﺎدي ﻓﻲ دﺑﻲ‬ © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 9 Before starting to search: consider search terms for the topic or subject of interest Are there any synonyms or variant spellings? Tyre or tire; Aluminum Candy or sweet Basle or Basel Are there any other words likely to be in documents on the topic? Are any keywords part of a common phrase? Are any keywords likely to be in irrelevant documents that should be excluded from searches? How might the information be written? “I work for Xcompany” to search for employees of Xcompany © AWARE 2013 © Arthur Weiss, AWARE, 2013 “X is better than” for comparisons Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 10 5
  6. 6. 14 October 2013 Surfacing the Web Websearch Academy 2013 Research Planning Information Requirements © AWARE 2013 Break down into individual questions that, when answered, will provide the required knowledge Don’t start searching without knowing what you are looking for, and why Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 11 An example research plan Copy & fill in sheet for each key information question / topic Research Topic Research Questions (breakdown topic into answerable questions) Sources LINKEDIN GOOGLE SCHOLAR NATIONAL STATISTICS © AWARE 2013 © Arthur Weiss, AWARE, 2013 Search Approach / Parameters JOB TITLE, CURRENT EMPLOYER, ETC. AUTHOR NAME, TOPIC, DATE, ETC. SITE SEARCH ENGINE Type of information expected Comments / Possible problems PEOPLE PROFILES MAY NOT BE ACCURATE OR IN-DATE CITATIONS, ACADEMIC DOESN T COVER RESEARCH PAPERS . EVERYTHING CENSUS & DEMOGRAPHIC MAY BE OLD OR DATA INCOMPLETE Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 12 6
  7. 7. 14 October 2013 Surfacing the Web Websearch Academy 2013 Types of “Invisibility” Not search engine optimised so pages fail to appear in “simple” searches © AWARE 2013 Not indexed by search engines Excluded from search index Subscription or proprietary content Encrypted or nonindexable content Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 13 Advanced Searching •  Use advanced search operators and options e.g. Filetype: / InTitle: / InUrl: / .. (numeric) and * (wildcard) © AWARE 2013 © Arthur Weiss, AWARE, 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 14 7
  8. 8. 14 October 2013 Surfacing the Web Websearch Academy 2013 Search Engines – not just Google © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk Types of “Invisibility” Not search engine optimised so pages fail to appear in “simple” searches © AWARE 2013 © Arthur Weiss, AWARE, 2013 Not indexed by search engines Excluded from search index Subscription or proprietary content Encrypted or nonindexable content Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 16 8
  9. 9. 14 October 2013 Surfacing the Web Websearch Academy 2013 Specialist Search / Deep Web Search © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 17 Search for Information “Containers” •  Knowing a reason for the information to be available can lead to an information source   Who else would want this information?   Search for topic + “Database” e.g. Coffee database – first two results: © AWARE 2013 © Arthur Weiss, AWARE, 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 18 9
  10. 10. 14 October 2013 Surfacing the Web Websearch Academy 2013 Case Examples – Economics by Country © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 19 Case Examples – Trade Statistics © AWARE 2013 © Arthur Weiss, AWARE, 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 20 10
  11. 11. 14 October 2013 Surfacing the Web Websearch Academy 2013 Case Examples – Economic Indicators © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 21 Case Examples – Genealogy © AWARE 2013 © Arthur Weiss, AWARE, 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 22 11
  12. 12. 14 October 2013 Surfacing the Web Websearch Academy 2013 Types of “Invisibility” Not search engine optimised so pages fail to appear in “simple” searches © AWARE 2013 Not indexed by search engines Excluded from search index Subscription or proprietary content Encrypted or nonindexable content Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 23 Proprietary sites / Blocked from Index •  Register for password protected sites •  Use site search or site map – if available •  If Robots.txt file exists may be able to view the hidden pages e.g. nytimes.com/robots.txt © AWARE 2013 © Arthur Weiss, AWARE, 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 24 12
  13. 13. 14 October 2013 Surfacing the Web Websearch Academy 2013 Types of “Invisibility” Not search engine optimised so pages fail to appear in “simple” searches © AWARE 2013 Not indexed by search engines Excluded from search index Subscription or proprietary content Encrypted or nonindexable content Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 25 Content that can’t / won’t be indexed •  Non-textual information e.g. multimedia / audiovisual   Bing has search operators that can find RSS feeds (hasfeed:) and pages containing specific types of file (e.g. mp3 files – contains:mp3)   Search for related textual information e.g. descriptions, or sources (e.g. artwork or film titles) •  Encrypted information / .Onion sites   Project Tor (torproject.org) and the TOR browser Access encrypted sites via proxy servers © AWARE 2013 © Arthur Weiss, AWARE, 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 26 13
  14. 14. 14 October 2013 Surfacing the Web Websearch Academy 2013 Searching TOR •  On regular Google: fake passport site:onion.to © AWARE 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 27 TOR / .Onion Sites © AWARE 2013 © Arthur Weiss, AWARE, 2013 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 28 14
  15. 15. 14 October 2013 Surfacing the Web Websearch Academy 2013 Any Questions? Arthur Weiss is the managing director of AWARE - a UK based consultancy specialising in marketing & competitive intelligence analysis. Contact Details: Web Sites: www.marketing-intelligence.co.uk E-mail: a.weiss@aware.co.uk Twitter: @awareci Telephone: Fax: © AWARE 2013 © Arthur Weiss, AWARE, 2013 +44 20 8954 9121 +44 20 8954 2102 Tel: +44 20 8954 9121 • Fax: +44 20 8954 2102 • Web: www.marketing-intelligence.co.uk 29 15

×