1. UEF // University of Eastern Finland
University of Eastern Finland Library
Heikki Laitinen
heikki.laitinen@uef.fi
2021
Information Retrieval in Health-Related
Natural Sciences
Applied physics, biomedicine, environmental sciences, medical engineering and
computing, nutrition, pharmacy, toxicology
2. UEF // University of Eastern Finland 2021
Heikki Laitinen 2
Critical success factors in information retrieval
•Understanding your topic to be searched
•Ability to transform the topic into keywords and search queries
•Understanding the process nature of searching
– Preparation, execution, evaluation of results
– Part of your research process
– Continuous process
•Knowing relevant information sources and how to use them
•Getting access to primary information
– Electronic and printed journals
– Electronic and printed books
3. UEF // University of Eastern Finland 2021
Heikki Laitinen 3
The information retrieval problem
•How can you find all needed information, maximizing the
number of relevant search results and minimizing the number
of irrelevant ones?
= relevant item of information not retrieved
4. UEF // University of Eastern Finland 2021
Heikki Laitinen 4
Search recall and precision
• Recall (sensitivity):
• Precision (specificity):
• Recall and precision are inversely proportional
– In a comprehensive search (high recall), you get also irrelevant results
(low precision), and vice versa
5. UEF // University of Eastern Finland 2021
Heikki Laitinen 5
Search recall and precision
•What are the reasons for this phenomenon?
– Search query can be seen as a model of search topic, consisting of
keywords and defined relationships between keywords. This model
represents topic incompletely.
– Even scientific language is not exact. The keywords may be inaccurate
and have multiple meanings.
– Information retrieval system finds character strings only, not meanings
which are essential to searcher.
•But:
– If all search keywords are exact and have only one meaning, then
both recall and precision may be 100%.
– E.g.: find ADI (acceptable daily intake) for benzoic acid.
6. UEF // University of Eastern Finland 2021
Heikki Laitinen 6
How to do a literature search
•Select suitable literature reference database(s)
– UEF Primo -> Database search -> enter the database name
– UEF library home page -> Short cuts -> Frequently used databases
•Define search strategy and formulate search queries
•Search database(s)
– Iterative process: review initial search results and modify your search,
if necessary
•Save search results to reference management program, such
as RefWorks
•Select relevant references and obtain complete articles
7. UEF // University of Eastern Finland 2021
Heikki Laitinen 7
How to obtain complete articles
• Use "Full text availability at UEF" links in RefWorks
• Use "Find It" links in database search results
• Try UEF Primo (printed and e-journals)
• Try Unpaywall
– Free plug-in for Firefox and Chrome, available from https://unpaywall.org
– Finds legal open access versions from thousands of repositories worldwide
– How to use – see the video: https://www.youtube.com/watch?v=FXJMKwql_dc
• Try CORE (https://core.ac.uk/) to find legal open access version
• Try Google to find possible "journal free trial"
• Request via UEF Library Interlibrary Service (subject to charges)
• Read the blog:
https://blogs.uef.fi/ueflibrary/ilmaisia-artikkeleita-laillisesti-free-articles-legally/
8. UEF // University of Eastern Finland 2021
Heikki Laitinen 8
Search strategies
•Quick search (called also "quick and dirty")
– Think up some words about your topic and search as if you were using Google
– Useful, when: you want to find just something quickly, need only few
publications, want to familiarize yourself with your topic, etc.
– Unfortunately too many people use this strategy only in all their searches
("Google effect")
– How can you be sure that you found all relevant publications?
•Block search
– Analyze your topic carefully (main concepts and sub-concepts)
– Find out keywords which describe your concepts as precisely as possible
– Create search queries by combining keywords with Boolean operators
– Useful in situations where your topic is well-defined and you want
comprehensive search results
9. UEF // University of Eastern Finland 2021
Heikki Laitinen 9
Search strategies
•Citation pearl growing
– Useful when your topic is ambiguous and it is difficult to find keywords
– Start with one known relevant article
– Search a database to find the article reference
– Examine the reference to find suitable keywords, and do the actual search with
these keywords
– Some databases display "related articles" links in search results to make pearl
growing easy
•Citation searching
– Based on the assumption that there is a relationship between publication’s
content and citations in the publication
– E.g. you have an old "classic" journal article about your topic and want to find
newer ones citing the classic
10. UEF // University of Eastern Finland 2021
Heikki Laitinen 10
How to formulate a search query
• Search query is search topic (natural language) interpreted into database
search language
• Define your topic
• Write a short description about the topic
• Define concepts and their relationships
– Mind map or tabular form
– Dependence or cause-effect: "Effect of A on B" (Boolean AND in search)
– Parallelism: "C and D are included in B" (Boolean OR in search)
– Exclusion: "M, but without N" (Boolean NOT in search)
• Find keywords corresponding to concepts
– Free-text keywords or thesaurus-based keywords
• Combine keywords to create search query or queries
• Usually, several queries are needed even if there is only one search topic
11. UEF // University of Eastern Finland 2021
Heikki Laitinen 11
Mind map example
12. UEF // University of Eastern Finland 2021
Heikki Laitinen 12
Tabular form example
• Topic: Workers’ exposure to organophosphates and carbamates through skin
13. UEF // University of Eastern Finland 2021
Heikki Laitinen 13
Free-text keywords
• Words which appear in titles and abstracts of publications
– Think up by yourself, look from dictionaries, use Google, etc.
– Must be in language used in the database (mostly English)
• Use truncation to retrieve different wordings, singular, plural, etc.
– pollut* -> pollution, polluted, pollutant, pollutants, etc.
• Use synonyms if you want to do comprehensive search
– Dust OR "fine particles" OR "particulate matter" (place phrases in quotes)
– UK English / US English (sulphur, sulfur, "coeliac disease", "celiac disease", etc.)
• Homonyms (words with multiple meanings) produce false search results
– E.g there are 8 different meanings for "administration" in MOT dictionary
– Use complete phrase instead of abbreviation, if it is ambiguous
– NLP: "neuro-linguistic programming" or "natural language processing"?
• Use free-text keywords when:
– Your topic is specific / precisely defined
– Terminology of the topic is well established and unambiguous
14. UEF // University of Eastern Finland 2021
Heikki Laitinen 14
Thesaurus-based keywords
• Also known as descriptors or subject headings
• Thesaurus is a structured, subject-specific list of descriptors
• Descriptors are standardized words (controlled vocabulary) used in
information storage and retrieval
– Subjects of publications are described with descriptors in the database
– The same descriptors may be used in searching
– Link to thesaurus is often on database search page
• Descriptors highlight the essential content of a publication, regardless of
words in title and abstract
• Use of descriptors increases search recall and precision
• Most recent concepts may not be found in descriptors
– You can combine descriptors and free-text keywords
• Medical Subject Headings (MeSH) is the most important thesaurus in
health sciences and related natural sciences
• Use MeSH in PubMed search queries
15. UEF // University of Eastern Finland 2021
Heikki Laitinen 15
Combining keywords
• Boolean operators OR, AND, NOT
• Keywords and Boolean operators may be typed either in lower or UPPER
case (type operators always in upper case when searching PubMed database)
• Use OR to combine synonyms: eye OR ocular
• Use AND to join dependent concepts: soil AND pollution
• Use NOT to exclude concepts: solvents NOT (ethanol OR methanol)
• It is possible to use several operators in one query
– AND’s are executed first by default, then OR’s
• If there are both AND and OR operators in a query, use parentheses to
override default execution and retain correct search syntax
• Operators in parentheses are executed first
• (dust OR particle* OR particulate*) AND atmospher*
16. UEF // University of Eastern Finland 2021
Heikki Laitinen 16
Boolean logic
17. UEF // University of Eastern Finland 2021
Heikki Laitinen 17
Tabular form example
• Topic: Workers’ exposure to organophosphates and carbamates through skin
18. UEF // University of Eastern Finland 2021
Heikki Laitinen 18
Search query example 1
• Thesaurus-based search in PubMed database
• Topic: workers’ exposure to pesticides
• Concepts
– Workers
– Exposure
– Pesticides
• Medical Subject Headings
– Occupational Exposure
– Pesticides
• Search query
"Occupational Exposure"[mesh] AND Pesticides[mesh]
19. UEF // University of Eastern Finland 2021
Heikki Laitinen 19
Search query example 2
• Free-text keyword search in Scopus database
• Topic: health effects of molds in indoor air
• Concepts and free-text keywords
– indoor (air)
– mold, mould
– health effects (also: effects on health, etc.)
• Search query
indoor AND (mold* OR mould*) AND (health* W/2 effect*)
• Note parentheses, keyword truncation and proximity operator W/2
– Keywords may be in either order but no more than specified number of
words apart
– Number may be chosen freely, usually 1-4 is appropriate
20. UEF // University of Eastern Finland 2021
Heikki Laitinen 20
Search query example 3
• Boolean AND is not the same as "and" in natural language
• Topic is expressed as: "big data and data mining in nutrigenetics, in
nutrigenomics, in pharmacogenetics and in pharmacogenomics"
• Information is actually needed:
– Either about big data or about data mining, or about both topics together
– Related to nutrition or to pharmacy, or to both
• Thus the search query is:
("big data" OR "data mining") AND (nutrigen* OR pharmacogen*)
21. UEF // University of Eastern Finland 2021
Heikki Laitinen 21
What to do, if you get
•Too few search results or no results at all
– Increase search recall (sensitivity) and decrease precision (specificity)
– Check spelling of keywords (e.g. naphthalene instead of naphtalene)
– Use broader (more general) keywords
– Use fewer AND operators
– Find synonymous keywords and combine them with OR operators
– Truncate keywords
•Too many search results
– Increase search precision (specificity) and decrease recall (sensitivity)
– Use narrower (more precise) keywords
– Add more keywords with AND operators
– Use fewer synonyms (fewer OR operators)
– Limit your search: keywords in titles only; the most recent references only;
english only; review articles only
22. UEF // University of Eastern Finland 2021
Heikki Laitinen 22
PubMed
• The essential journal article reference database in health sciences and
related areas, produced by the U.S. National Library of Medicine
• Freely accessible on the Internet
• Contains more than 27 million references from about 5600 journals,
starting in 1950’s
• Search techniques:
– Enter complete query in Search box
– Use Advanced search to find individual concepts first, then combine these search sets
to produce the final result
• Search results (references) are linked to full texts with Find It links
– Click article title in search result list and see the link on top right corner of the abstract
• You can set alerts (automatic e-mail notification of new articles)
23. UEF // University of Eastern Finland 2021
Heikki Laitinen 23
Scopus
•Multidisciplinary literature reference database, produced by
Elsevier, the Netherlands
– About 23 600 peer reviewed journals of which 3 600 are open access journals
– More than 67 million journal article references
– References to conference papers, books and patents
– Bibliometric indicators such as number of citations and author H index
•Disciplines
– Biosciences and pharmacy (about 4 300 journals)
– Health sciences (about 6 800 journals)
– Natural sciences (about 7 200 journals)
– Social sciences and humanities (about 5 300 journals)
24. UEF // University of Eastern Finland 2021
Heikki Laitinen 24
Scopus
• Searching language English, you may use small or capital letters
• Search results are linked to full text articles
• Search results may be exported directly to RefWorks
• Automatic e-mail alerting of new publications possible
• Use asterisk to find different variations of keywords (truncation)
– metabol*
– *tocopherol* (also in the beginning of keyword)
– sul*ur (also inside the keyword)
• You may also use question marks to replace letters
– wom?n
– Number of question marks equals number of characters replaced
25. UEF // University of Eastern Finland 2021
Heikki Laitinen 25
Scopus
• Searches words in titles, abstracts and keywords by default
• Use Boolean operators (AND, OR, AND NOT) to combine keywords
– (sweeten* AND NOT (sugar* OR sucrose)) AND diabet*
• Use braces to search for exact phrases
– {to be or not to be}
• Use quotes to search for broader phrases
– "climat* chang*" (truncation of phrase words is possible)
• If braces of quotes are missing, keywords are combined with AND by
default
– orange juice -> orange AND juice
26. UEF // University of Eastern Finland 2021
Heikki Laitinen 26
Scopus
• Proximity operators
– Define how close each other your keywords must be
– Adjust search precision: broader than phrase, narrower than Boolean AND
– W/n (n=0-255) : There may be at most n other words between keywords,
keywords may be in given order or in inverted order
– PRE/n (n=0-255) : There may be at most n other words between keywords,
keywords must be in given order only
• E.g. effect of climate change on agricultural productivity in Europe:
"climate change" AND "agricultural productivity" AND europe -> 31 results
climat* W/3 chang* AND agricultur* W/3 productiv* AND europe* -> 96 results
climat* AND chang* AND agricultur* AND productiv* AND europe* -> 420 results
(searched in September 2021)
27. UEF // University of Eastern Finland 2021
Heikki Laitinen 27
Scopus search example
Sense of coherence in eating disorders
28. UEF // University of Eastern Finland 2021
Heikki Laitinen 28
Scopus search example
Health effects of gut microbiota in humans
29. UEF // University of Eastern Finland 2021
Heikki Laitinen 29
Web of Science
• Multidisciplinary journal article reference database, produced by
Clarivate Analytics, USA
• Citation counts and author h index available
• Web of Science in the University of Eastern Finland:
– Science Citation Index Expanded 1975-
• Fully covers 8300 journals (150 disciplines)
– Social Sciences Citation Index 1975-
• Fully covers 2900 journals (50 disciplines)
– Arts & Humanities Citation Index 1975-
• Fully covers 1600 journals
30. UEF // University of Eastern Finland 2021
Heikki Laitinen 30
Web of Science
• Ordinary searching is possible by: topic, author, journal name,
publication year, author address
– Topic = title words, abstract words, free-text keywords
• There is no thesaurus in Web of Science
• Search results are linked to full text articles
• Search results may be exported directly to RefWorks
• Analysis of search results
– To find e.g. research institutions, authors, or journals in certain subject area
– Do a broad (high recall) search to get statistically significant analysis
– Click "Analyze Results" on top of the search results screen
– Select analyzing parameters
31. UEF // University of Eastern Finland 2021
Heikki Laitinen 31
Web of Science
• Use asterisk to truncate and replace 0-n characters
– metabol*
– *tocopherol*
– sul*ur
• Use question mark to replace exactly one character
– wom?n
• Use dollar sign to replace 0-1 characters
– colo$r
• You may include several truncation characters in a keyword
– organi?ation*
32. UEF // University of Eastern Finland 2021
Heikki Laitinen 32
Web of Science
• Use quotes to search for phrases
– "heart attack*" (phrase truncation is possible)
• If quotes are missing, keywords are combined with AND by default
– orange juice -> orange AND juice
33. UEF // University of Eastern Finland 2021
Heikki Laitinen 33
Web of Science
• Boolean operators (AND, OR, NOT) may be either selected from
menus or written directly to query
– (sweeten* NOT (sugar* OR sucrose)) AND diabet*
• Proximity operator
– NEAR/n : There may be at most n other words between keywords,
keywords may be in given order or in inverted order
– NEAR (without a number) : There may be at most 15 other words
between keywords (equivalent to NEAR/15)
– (indoor* NEAR/2 pollut*) AND (voc OR volatile)
• Use proximity operators to adjust search precision