Searching the Internet More Effectively


Published on

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • 22/03/12 (c) Karen Blakeman 2012
  • 22/03/12 (c) Karen Blakeman 2012
  • Searching the Internet More Effectively

    1. 1. Searching the Internet More Effectively Barnsley 29th February 2012 Karen Blakeman RBA Information Services Slides are available at Twitter: @karenblakeman presentation is licensed under a Creative Commons Attribution 3.0 License
    2. 2. How it all startedBefore 1992 priced electronic databases - for example Lexis(legal), Nexis (news), technical/scientific data – and print(government Daily Lists, Annual Reports, directories, localnewspapers, official statistics)1992 – the Internet can be accessed by anyone but 2-3 yearsbefore significant information started appearing on the webIncrease in amount of data and information led to thedevelopment of tools that indexed and searched the content ofweb pagesLycos, Excite, AltaVista, Hotbot22/03/12 2
    3. 3. How the search tools worked (and still do in part) "Crawl" the internet looking for new and updated pages by following links Copies of pages and documents added to a database that is publicly searchable Results sorted according to: – how often the words you looked for appear in the page – where they appear (words in the title and first few sentences given higher ranking) – and many other criteria not disclosed by the search engines They do not cover: – password protected sites – databases or sites where you have to fill in a form to find the information, for example Companies House22/03/12 3
    4. 4. Then along came..... 11 November 1998 The Internet Archive www.archive.org22/03/12 4
    5. 5. How was Google different? Links (citations) a major part of ordering search results 5
    6. 6. Where is Google now?2001Revenues $86,426 thousandsNet Income $10,964 thousands 2011 Revenues $37,905 millions Net Income $9,737 millions 2011 – 96% of revenues are from advertising Google is mass market consumer oriented. Serious researchers wanting reliable, structured search are a miniscule fraction of their customer base. 22/03/12 6
    7. 7. How Google organises and sorts information Has a primary index of higher "quality" documents and a secondary index. Only the primary index is searched when running straightforward searches. Secondary index comes into play with more complex searches and if a small number of results are found. “Dear Bing, We Have 10,000 Ranking Signals To Your 1,000. Love, Google” Over 200 hundred “signals” and each may have over 50 variations22/03/12 7
    8. 8. How Google ranks and organises your resultsGoogle personalizes and tailorsyour results depending on yourlocation, computer/device,browser, past searches, whatyou have looked at in the past,your +1s, your Google+account, what you had forbreakfast...and anything else itcan find by rummaging aroundin your Google dashboardTo see whats in your dashboard log in to your Google account and go to see Google personalisation: web history isn’t the only problem 22/03/12 8
    9. 9. What I see on my screen for a search is not what you’ll see on yours.22/03/12 9
    10. 10. Google knows best! Google automatically looks for variations of your search terms Hewish mild Google decided to change my search to Jewish mild without asking Placing a phrase within quote marks – "Hewish mild" – will usually force an exact match22/03/12 10
    11. 11. For 10 days in February 2011: coots = lions Google decides that coots are really lions – Update on coots vs. lions – 11
    12. 12. Coots = lions22/03/12 12
    13. 13. Three search tricksThese three techniques can change what Google (and othersearch engines) decides to give you and also the order of theresults.Repeat important search termscoots coots mating behaviour (found coots)Change the order of your termsmating behaviour coots (found coots)Change one of your search termscoots mating behaviour (found lions)coots courtship behaviour (found coots)coots mating ritual (found coots)22/03/12 13
    14. 14. Excluding pages containing wordsWant to exclude pages containing a term? Place a - (minus sign)before the termUse with care as may miss important materialExcluding lions from our bizarre coots search coots mating behaviour –lionsgave us:22/03/12 14
    15. 15. Coots=lions was an extreme example of how Google can workWe think Google was doing the following: - assumed a typing error or was running a mobile/smartphone predictive text algorithm (coots=cats) - ran an automatic variation/synonym search on cats - used a search frequency rule and found that lions mating behaviour was requested more than cats22/03/12 15
    16. 16. Dear Google, stop messing with my search no longer looks for all of yourterms in a page 22/03/12 16
    17. 17. See what Google sees Hover over a result and a "preview" of the page should appear to the right together with a Cached link – this is Googles copy22/03/12 17
    18. 18. “When you do a multi-term query on Google (even with quoted terms), the algorithm sometimes backs-off from hard ANDing all of the terms’s clear that people will often write long queries (with anywhere from 5 to 10 terms) for which there are no results. Google will then selectively remove the terms that are the lowest frequency to give you some results (rather than none)....Soft AND is a way to reduce the overall frustration and give the searcher something to examine (and with luck, a chance to reformulate their query).” Dan Russell 18
    19. 19. VerbatimForces Google to run an exactmatch search. Run your search firstand then select Verbatim from theleft hand menu on your results pageCannot be combined with timeoptions in the side barGoogle: Verbatim for exact matchsearch 22/03/12 19
    20. 20. Google doing its own thing can be good22/03/12 20
    21. 21. Googles new(ish) social network Google Plus (Google+) Google trying forcing people to create a Google+ profile Search Plus Your World (SPYW) referred to as Search+ now available in and is the default. Gives priority to content from people in your Google+ network if you are signed in to your account. (And the next Google killer is….Google! )22/03/12 21
    22. 22. SPYW Currently being tested on Google.comBefore After 22/03/12 22
    23. 23. SPYW Currently being tested on Google.com22/03/12 23
    24. 24. Google results side barThese help you focus yoursearchVary depending on type ofsearch e.g. web, news, imagesOpen up the "more" options tosee everything22/03/12 24
    25. 25. Google side barsImages Videos News Books Blogs 22/03/12 25
    26. 26. Google images – not always what you expect Search for patent and select the colour red from the side bar (Thanks to Arthur Weiss for the example)22/03/12 26
    27. 27. Related searches22/03/12 27
    28. 28. Translated foreign pages for a different perspective Google suggests languages from context of search but you can choose your own Your search is translated and the results are translated into your language 22/03/12 28
    29. 29. Problems finding information on a particular site?Use Googles site: commandFor example, trying to find information on Reading Borough Councils recycling policy by searching 29
    30. 30. Go to Google and type in recycling policy 30
    31. 31. Or if you are interested in all government (central,departmental and local) recycling policies: recycling policy site:gov.uk22/03/12 31
    32. 32. Combine with date option in the side bar22/03/12 32
    33. 33. LGSearch Custom Search Engine (CSE)22/03/12 33
    34. 34. Create your own Google custom search engine For – regularly searched sites – selected sites on a subject or type of organisation Cannot include password protected sources or sites where you have to fill in a form to access the information Information on setting up a Google Custom Search Engine (CSE) Googles blog on custom search March 2012 Karen Blakeman 34
    35. 35. Looking for a particular type of information for example statistics,research report, expert presentation?Use the filetype: commandFor statistics car ownership UK filetype:xls car ownership UK filetype:xlsxFor government, research, industry reports UK oil consumption forecasts filetype:pdfFor conference presentations or trying to locate an expert renewable energy UK filetype:ppt renewable energy UK filetype:pptx22/03/12 35
    36. 36. Can combine commandsrenewable energy UK filetype:ppt site:ac.ukAdvanced search screen with more options at Google Commands 36
    37. 37. Google alternatives - Bing and YahooYahoo now uses’s database and rankingMany of the Advanced Search commands are similar to Google’s, seeSearch Tools Summary and Comparison of the interesting developments and features are only available inthe US versionResults tend to be more consumer/retail focused unless usingadvanced search featuresCoverage not identical to Google’s - sometimes yields importantunique contentSometimes more up to date than Google22/03/12 37
    38. 38. DuckDuckGo – silly name but a neat little search tool tracking, no “filter bubble”Commands site: filetype: sort:date to sort by date (uses results from Blekko)Syntax and keyboard shortcuts at 38
    39. 39. Flickr to search for imagesUse the default search box or Flickr Creative Commons or advanced search screen 39
    40. 40. Statistics 40
    41. 41. MySociety 41
    42. 42. MySociety 22/03/12 42
    43. 43. - Local crime and policing information forEngland and Wales : 22/03/12 43
    44. 44. Professional network For people and companies For identifying experts in a field Boolean Black Belt-Sourcing/Recruiting 44
    45. 45. FacebookPersonal and businesspages relatively easy tofindNo easy way to searchcontent within pages 22/03/12 45
    46. 46. Local "stuff"Web pages, local papers, "whats on", local forums/discussion boards,Facebook pages, TwitterTwitter search up lists (can be kept private) - view through, desktopprogram or mobile app22/03/12 46
    47. 47. My local stuff on Tweetdeck22/03/12 47
    48. 48. - create your own newspaper22/03/12 48
    49. 49. 49
    50. 50. CopyrightAlways check the copyright of anything that you want to use or incorporateinto a document or web pageAlways, always check and double check the copyright of images - may havea digital watermark and be tracked e.g. DigimarcCreative Commons does not mean you can do what you like with thetext/image – six licences“Open-licencing your images. What it means and how to do it.” Andy Mabbettaka pigsonthewing – Blakemans Blog “Free-to-use images might not be” – 50
    51. 51. Evaluating resourcesType of web site for example: –,, .gov, .eduWho is really behind the site? – use a domain name register such as – you do NOT want to see that the domain name is hosted by an organisation such as this:22/03/12 51
    52. 52. Evaluating resourcesDate of publication, last updatedCheck text for clues of publication dateStated date for a web page or document may be automaticallygenerated when it is put onto the web siteAfter a web site redesign pages are re-uploaded and are given anew publication dateSome pages are generated "on the fly" so will always havetodays date22/03/12 52
    53. 53. Quoting and referencingMake it clear when you are quoting someone else and always quotethe source of dataGive at least the title of the article and URL in the text of a documentFull reference: – author (and/or organisation), title of page/document, URL (web address – do not use shortened URLs), date of publication (if known), date you accessed the document – George Monbiot, In Praise of Distrust, 27th February 2012, [Accessed 28th February 2012] – organisations and publishers may have their own preferred formatIf the information is critical make a local copy22/03/12 53
    54. 54. Keeping up to dateInside Search Google Blog Scholar Blog Engine Land Engine Watch Black Belt-Sourcing/Recruiting Blakeman’s Blog Bradleys weblog 54
    55. 55. 55
    56. 56. When are road works not road works? When they are classified as Network Rail bridge works! 22/03/12 56 CC 3.0 Attribution Non-commercial