Searching the internet - better with Google / Google not always best

3,009 views

Published on

Search clinic for international music students at Codarts Rotterdam

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,009
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
1
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • Searching the internet - better with Google / Google not always best

    1. 1. UB Utrecht HvA-MIC GO Opleidingen searching the internetbetter with Google / Google not always best Eric Sieverts @sieverts CODARTS, 04-03-2013
    2. 2. agenda • searching the web • smart searching • google options • beyond google • beyond general web search for all links see: http://sieverts.pbworks.com/codarts2
    3. 3. the general agenda importance web of specific ?=? material everything types? general specific web material search search how to …how to … when & why
    4. 4. an ever changing google landscape • unreliable numbers • irreproducible results • disappearing functions • changing interfaces4
    5. 5. 5
    6. 6. building block approach systematic searching in structured information systems (like JStor etc.) start analytically with so-called building block approach e.g.: subject "modern american composers" – it breaks up in 3 facets – collect keywords for each facet – combine keywords with OR and AND operators modern american composers modern american composer contemporary america composers 20th century OR usa OR songwriters OR twentieth century united states … … …6 AND AND
    7. 7. building block approach modern american composers modern american composer contemporary america composers 20th century OR usa OR songwriters OR twentieth century united states … … … AND AND it makes a query: (modern OR contemporary OR "twentieth century" OR "20th century") AND (america OR american OR usa OR "united states") AND (composer OR composers OR songwriter OR songwriters)7
    8. 8. building block approach also with Google ? web search engines are not specifically designed for such structured queries, but it is possible to do Google and Yahoo make it even easier, since you may omit parentheses and the AND-operator (since it is default) : implied AND modern OR contemporary OR "twentieth century" OR "20th century" america OR american OR usa OR "united states" composer OR composers OR songwriter OR songwriters implied AND8
    9. 9. relevance ranking (1) Google (and other web search engines) are primarily focused on presenting search results in order of relevance how do they know what is relevant? – they interpret the importance of words for the subject matter of the retrieved documents (your search terms present in title, url, headings, ... ?) • you can enhance importance of a certain term for your query by repeating that word a couple of times – they estimate the importance of the relation between words in the retrieved documents: whether .. • your search words occur close together • your search words occur in same order as you entered them9 >> formulate your query like you expect it formulated
    10. 10. word order matters
    11. 11. relevance ranking (2) Google (and other web search engines) are primarily focused on presenting search results in order of relevance how do they know what is relevant? – importance or quality of retrieved web pages is deduced from the number and the importance of links from other sites (for each site a pagerank is calculated) – importance of retrieved web pages for your personal interest is deduced on basis of your previous search and browse behaviour, which is monitored whenever youre logged in since every search engine uses somewhat different algorithms for its relevance calculations (and their coverage is different as well) there tends to be little overlap between top 10 results form different engines11
    12. 12. search terms use of proper search terms is crucial for search success think of : – singular / plural , verbs / nouns / adjectives , conjugations , ... – spelling variations (behavior / behaviour) – compound terms (writer / songwriter) – synonyms, acronyms (compact disc / compact disk / cd / digital disc) how would the answer to my question be formulated in a relevant document? "think as if being a document" – the right terms – as an "exact phrase" or in most probable word order – use wildcard for variable words ("modern * * composers") – use known examples from a list to be found – use of popular <> scientific terms etc.13
    13. 13. refining searches if results are too broad, too diverse – add another essential term or set of terms to your query – see what your search engine suggests while you enter your query – exclude unwanted term with NOT (francis bacon NOT philosopher) NB: Google does not understand NOT ; use minus-sign instead:14 francis bacon -philosopher
    14. 14. nice interactive infographic "how search works" http://www.google.com/insidesearch/howsearchworks/thestory/15
    15. 15. is Google outsmarting us ? Google tries to improve and to broaden your queries • automatic spelling corrections (veilgheid >> veiligheid) • automatic search for words with same word stem (singular/plural, verb, conjugation, inflection, …) • expands acronyms (jfk >> john f kennedy | wwii >> world war II) • adds some synonyms (vaccination >> immunization) • transforms separate words to compound term & vice versa (veiligheid maatregel >> veiligheidsmaatregel | catfood >> cat food) • may leave out term as optional if not differentiating enough more often what/when or notEnglish than in Dutch never sure and elaborate in • personalisation based on previous search behaviour but what, if you dont like all of this ........16 >> "verbatim"
    16. 16. d searche only literally t f or t he exac u w ords yo entered on google.nl:"woord voor woord"
    17. 17. some more "how to" • domain search: site:edu OR site:edu.* [for all edu (sub)domains] site:shell.com OR site:philips.com • url search: inurl:novelty • title search: intitle:catalytic just • filetype search: filetype:pdf filetype:xls OR filetype:xlsx filetype:doc OR filetype:docx more than shown in advanced search drop-down menu filetype:rss • exact search: "greenhouses“ [or VERBATIM for all words]20
    18. 18. advanced search Google is hiding its advanced search screen : you must perform a simple search first, to get the "cog wheel"21
    19. 19. some more "how to" some of this can be done from the advanced search screen but regular search box offers greater flexibility, once you know the syntax • domain search: [in combination with real search terms] site:codarts.nl site:edu OR site:edu.* [for all edu (sub)domains] site:last.fm OR site:spotify.com • url search: inurl:course • title search: intitle:guitar22
    20. 20. some more "how to" (2) • filetype search: filetype:pdf filetype:xls OR filetype:xlsx more types than shown in advanced search filetype:doc OR filetype:docx drop-down menu filetype:rss • numeric search: 10..20 [includes all values in between] $10..$20 [not for other currencies] • punctuation: &, %, dot, ... [can be searched] €, /, ", comma, ... [is ignored] • exact search: "greenhouses“ [or VERBATIM for all words] • synonym search: ~guitar • time limitations: [after search, hidden in top menu]23
    21. 21. synonym search
    22. 22. datelimitations
    23. 23. 26
    24. 24. who searches for “Bach” is probably more interested in data about him, than in websites about him; and most probably in "J.S." instead of one of his relativesGoogles "Knowledge Graph"knows 500 million objectswith 3,5 billion properties andeven more mutual relations(but only in English)
    25. 25. it also interprets the intention of your query (sometimes ;-)28
    26. 26. general search engines besides google • Bing microsoft, large • Yahoo! content=Bing, large • Blekko uses hashtags to search more [domain-] selective also many predefined hashtags; e.g. /likes for Facebook • DuckDuckGo assures privacy, no personalisation, no filter-bubble, rather small, !Bang-function offers many extras • Gigablast green search engine, rather small, some unique functions • Exalead french, many advanced functions, primarily demo system • Millionshort leaves out results from most popular sites → the long tail • WolframAlpha knowledge engine, facts, calculations together, these others have 30% market share in US; in NL only 3% • Yandex in Russia more popular than Google • Baidu in China more popular than Google • Naver, Daum in South Korea more popular than Google • Seznam in Czechia more popular than Google30
    27. 27. material type specific search science google scholar, microsoft academic, scirus, oaister, scientific commons, science.gov reference wikipedia, quora, wolfram|alpha, answers.com news google news, yahoo news, bing news, cnn, bbc old news way-back-machine, historische kranten KB images google image, yahoo image, bing image, flickr, tineye (ip-check), panoramio (geo-search) video google video, youtube, youtube edu channel, bing video, blinkx, voxalead-news tweets twitter search, topsy, postpost, snapbird social socialsearcher, socialmention, whostalkin, kurrently forums google groups, omgili, boardtracker blogs google blogs, icerocket, [rss] CTRLQ, RSS SearchHub31
    28. 28. scientific search books – Google Books (full text search) – Hathitrust Digital Library (open book scan project / part of G-books) – Librarything (catalog of 58.000.000 books from 1.000.000 owners) – GoodReads (reviews, recommandation, friends, ...) – Open Textbook Catalog (open access leerboeken) journal articles – licensed databases (like JStor, ...) – Google Scholar (articles, dissertations, reports, ...) – sEURch / UvA-library ("discovery" systems of EUR / UvA) – Scirus / SciVerse (journal articles -Elsevier- , database content, webpages) – Magportal (also -English- popular magazines) – DeepDyve (scientific articles "for rent" - for 24 hours)32
    29. 29. Google Books • all pages scanned and full-text searchable • important to discover specific subjects/terms - not primary book topic • often limitations on display and browsability (no preview / snippet view / limited preview / full preview) • content from publishers and large libraries • problems with viewing copyrighted material also from libraries • build your personal ‘My Library’ • NL-books not only from Gent University (and soon KB), also from US/UK • also some ‘magazines’ • metadata on about-this-book-page33
    30. 30. Google Scholar • > 100 million scientific publications (most articles) • differences between availability (and hence searchability) of full-text (majority), bibliographic-only, and citation data • competitor of Web of Science, Scopus, Scirus, ... • indexing many selected -even licensed- sources (publishers, abstract-databases, university sites, institutional repositories, ...) • includes numbers of citations! [and links to them] • number of citations important factor for relevance ranking (!! reason why recent publications get low rankings) • advanced search limited, many mistakes in metadata (authors etc.) • accessibility of full-text often a problem because of licences • often many versions of same article (including sometimes free ones) • coupling with library subscriptions to allow smoother linking • no info about sources, updates etc.37
    31. 31. open access if this article is interesting, these 23 more recent ones probably also ## ofcitations subscription univ. utrecht
    32. 32. facts and reference encyclopedias – wikipedia – internet movie database – ... Q&A (human powered) – Quora – Yahoo-answers direct answers, facts and calculations – Wolfram|Alpha dictionaries, translations – answers.com (metasearch) – Roget thesaurus – Bartleby – Google Translate – Google Translated search > – Synoniemen.net (dutch)41
    33. 33. wikipedia • >250 languages • “wisdom of the crowds” ?=? “wisdom” for all topics? • quite good for “factual” topics • many detailed specific topics (>20 million lemmas, >1 million NL) • there are policies & guidelines & management: stewards, administrators • for searching the wikipedia use Google rather than internal search limit to: site:wikipedia.org gives more complete results and searches directly in all language versions together42
    34. 34. googles"translated search"is now almost hidden
    35. 35. translates original query(here in english)into chosen languagesand translates resultsback into english
    36. 36. ... and pages selectedfrom the result list aretranslated in English too
    37. 37. old stuff : web & news • web archive – "way-back machine": old versions of websites, back to 1996 access thru the -original- url, NO search internal site links will mostly work – also other archived materials (a.o. music) • historical Dutch newspapers – historische kranten KB (1618-1995 ; full-text search) • historical international newspapers – British newspapers 1800-1900 – historic American newspapers – international overview50
    38. 38. … and the very oldest one from february 1998:53
    39. 39. twitter & social search twitter search (often limited to messages from past 1 - 2 weeks only) – twitter (also advanced search) – topsy (best one at the moment, also older messages) – postpost (search your own timeline - everything youre following) – snapbird (search thru all tweets of particular person - you have to know twittername) real time / social search – socialsearcher (facebook | twitter | g+ : side by side) – socialmention (also weblogs) – samepoint, whostalkin, kurrently, … (also weblogs) forum discussions – omgili, boardtracker, ... – Google groups54
    40. 40. 55
    41. 41. 56
    42. 42. 57
    43. 43. 58
    44. 44. multimedia search / images mostly search by keywords – Google-image (simple image recognition) – Yahoo-image (also pictures from Flickr) – Bing-image – Flickr (photo upload-site; search on user tags; filter on “Creative Commons” material) – photographs on twitter (twicsy, picfog, topsy, skylines.io, …) – special sites (beeldbank nationaal archief, wikimedia commons, ...) special techniques: – geographical (panoramio [google-maps], worldc.am [instagram], ...) – Google (search by example) – Tineye (search for -almost- exact copies; a.o. copyright infringed?)62
    45. 45. 63
    46. 46. image search Content based image retrieval (CBIR) • search on colors – examples: Tineye, Chromatik, Picitup, Google, ...64
    47. 47. image search Content based image retrieval • search by example – draw it yourself Retrievr, ... – existing image Google (visually similar) Tineye (almost exact copies) Retrievr, ... example found on the web or uploaded from your own computer65
    48. 48. example67
    49. 49. google looks for most probablekeywords to describe this imageand in the search box combinesthem already with the image ... and how about these "visually similar images" ?
    50. 50. photoshoppedadvertisement,but whats the original ?
    51. 51. multimedia search / video (mostly) uploaded material – YouTube (growth: 70 hours/minute ; also many "how to" videos) also: YouTube-channels / YouTube-education / YouTube-teachers / YouTube-movies / YouTube-shows / … – Vimeo (mostly) broadcasted material – Blinkx (35 million hours video, speech recognition?) – VoxaleadNews (speech recognition in several languages - also NL! hence "full-text" search on spoken words) – Bing-video (not easy to find from European home page) – Google-video (also videos from YouTube; metadata search only) – Dutch TV-programs: • Uitzending gemist (limited search functionality) • Beeld & Geluid (metadata search; use “uitgebreid zoeken”) • Academia (selection from Beeld & Geluid for higher education)74
    52. 52. ?
    53. 53. the end any questions?77

    ×